Commit Graph

975393 Commits

Author SHA1 Message Date
Håkon Bugge
11044f8c2c RDMA/cma: Fix incorrect Packet Lifetime calculation
[ Upstream commit e84045eab69c625bc0b0bf24d8e05bc65da1eed1 ]

An approximation for the PacketLifeTime is half the local ACK timeout.
The encoding for both timers are logarithmic.

If the local ACK timeout is set, but zero, it means the timer is
disabled. In this case, we choose the CMA_IBOE_PACKET_LIFETIME value,
since 50% of infinite makes no sense.

Before this commit, the PacketLifeTime became 255 if local ACK
timeout was zero (not running).

Fixed by explicitly testing for timeout being zero.

Fixes: e1ee1e62be ("RDMA/cma: Use ACK timeout for RoCE packetLifeTime")
Link: https://lore.kernel.org/r/1624371207-26710-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:29 +02:00
Gary Lin
fc2ea819b9 bpfilter: Specify the log level for the kmsg message
[ Upstream commit a196fa78a26571359740f701cf30d774eb8a72cb ]

Per the kmsg document [0], if we don't specify the log level with a
prefix "<N>" in the message string, the default log level will be
applied to the message. Since the default level could be warning(4),
this would make the log utility such as journalctl treat the message,
"Started bpfilter", as a warning. To avoid confusion, this commit
adds the prefix "<5>" to make the message always a notice.

  [0] https://www.kernel.org/doc/Documentation/ABI/testing/dev-kmsg

Fixes: 36c4357c63 ("net: bpfilter: print umh messages to /dev/kmsg")
Reported-by: Martin Loviska <mloviska@suse.com>
Signed-off-by: Gary Lin <glin@suse.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Link: https://lore.kernel.org/bpf/20210623040918.8683-1-glin@suse.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:29 +02:00
Vladimir Oltean
4228c00e14 net: dsa: sja1105: fix NULL pointer dereference in sja1105_reload_cbs()
[ Upstream commit be7f62eebaff2f86c1467a2d33930a0a7a87675b ]

priv->cbs is an array of priv->info->num_cbs_shapers elements of type
struct sja1105_cbs_entry which only get allocated if CONFIG_NET_SCH_CBS
is enabled.

However, sja1105_reload_cbs() is called from sja1105_static_config_reload()
which in turn is called for any of the items in sja1105_reset_reasons,
therefore during the normal runtime of the driver and not just from a
code path which can be triggered by the tc-cbs offload.

The sja1105_reload_cbs() function does not contain a check whether the
priv->cbs array is NULL or not, it just assumes it isn't and proceeds to
iterate through the credit-based shaper elements. This leads to a NULL
pointer dereference.

The solution is to return success if the priv->cbs array has not been
allocated, since sja1105_reload_cbs() has nothing to do.

Fixes: 4d7525085a ("net: dsa: sja1105: offload the Credit-Based Shaper qdisc")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:29 +02:00
Sasha Neftin
393d48b3de e1000e: Check the PCIm state
[ Upstream commit 2e7256f12cdb16eaa2515b6231d665044a07c51a ]

Complete to commit def4ec6dce ("e1000e: PCIm function state support")
Check the PCIm state only on CSME systems. There is no point to do this
check on non CSME systems.
This patch fixes a generation a false-positive warning:
"Error in exiting dmoff"

Fixes: def4ec6dce ("e1000e: PCIm function state support")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:29 +02:00
Eric Dumazet
cebff3d9f7 ipv6: fix out-of-bound access in ip6_parse_tlv()
[ Upstream commit 624085a31c1ad6a80b1e53f686bf6ee92abbf6e8 ]

First problem is that optlen is fetched without checking
there is more than one byte to parse.

Fix this by taking care of IPV6_TLV_PAD1 before
fetching optlen (under appropriate sanity checks against len)

Second problem is that IPV6_TLV_PADN checks of zero
padding are performed before the check of remaining length.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Fixes: c1412fce7e ("net/ipv6/exthdrs.c: Strict PadN option checking")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:28 +02:00
Antoine Tenart
9e753c47b9 net: atlantic: fix the macsec key length
[ Upstream commit d67fb4772d9a6cfd10f1109f0e7b1e6eb58c8e16 ]

The key length used to store the macsec key was set to MACSEC_KEYID_LEN
(16), which is an issue as:
- This was never meant to be the key length.
- The key length can be > 16.

Fix this by using MACSEC_MAX_KEY_LEN instead (the max length accepted in
uAPI).

Fixes: 27736563ce ("net: atlantic: MACSec egress offload implementation")
Fixes: 9ff40a751a ("net: atlantic: MACSec ingress offload implementation")
Reported-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:28 +02:00
Antoine Tenart
1c95d4d432 net: phy: mscc: fix macsec key length
[ Upstream commit c309217f91f2d2097c2a0a832d9bff50b88c81dc ]

The key length used to store the macsec key was set to MACSEC_KEYID_LEN
(16), which is an issue as:
- This was never meant to be the key length.
- The key length can be > 16.

Fix this by using MACSEC_MAX_KEY_LEN instead (the max length accepted in
uAPI).

Fixes: 28c5107aa9 ("net: phy: mscc: macsec support")
Reported-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:28 +02:00
Antoine Tenart
711a28d24d net: macsec: fix the length used to copy the key for offloading
[ Upstream commit 1f7fe5121127e037b86592ba42ce36515ea0e3f7 ]

The key length used when offloading macsec to Ethernet or PHY drivers
was set to MACSEC_KEYID_LEN (16), which is an issue as:
- This was never meant to be the key length.
- The key length can be > 16.

Fix this by using MACSEC_MAX_KEY_LEN to store the key (the max length
accepted in uAPI) and secy->key_len to copy it.

Fixes: 3cf3227a21 ("net: macsec: hardware offloading infrastructure")
Reported-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:28 +02:00
Håkon Bugge
c764f2d899 RDMA/cma: Protect RMW with qp_mutex
[ Upstream commit ca0c448d2b9f43e3175835d536853854ef544e22 ]

The struct rdma_id_private contains three bit-fields, tos_set,
timeout_set, and min_rnr_timer_set. These are set by accessor functions
without any synchronization. If two or all accessor functions are invoked
in close proximity in time, there will be Read-Modify-Write from several
contexts to the same variable, and the result will be intermittent.

Fixed by protecting the bit-fields by the qp_mutex in the accessor
functions.

The consumer of timeout_set and min_rnr_timer_set is in
rdma_init_qp_attr(), which is called with qp_mutex held for connected
QPs. Explicit locking is added for the consumers of tos and tos_set.

This commit depends on ("RDMA/cma: Remove unnecessary INIT->INIT
transition"), since the call to rdma_init_qp_attr() from
cma_init_conn_qp() does not hold the qp_mutex.

Fixes: 2c1619edef ("IB/cma: Define option to set ack timeout and pack tos_set")
Fixes: 3aeffc46afde ("IB/cma: Introduce rdma_set_min_rnr_timer()")
Link: https://lore.kernel.org/r/1624369197-24578-3-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:28 +02:00
Sukadev Bhattiprolu
d52ceed845 ibmvnic: free tx_pool if tso_pool alloc fails
[ Upstream commit f6ebca8efa52e4ae770f0325d618e7bcf08ada0c ]

Free tx_pool and clear it, if allocation of tso_pool fails.

release_tx_pools() assumes we have both tx and tso_pools if ->tx_pool is
non-NULL. If allocation of tso_pool fails in init_tx_pools(), the assumption
will not be true and we would end up dereferencing ->tx_buff, ->free_map
fields from a NULL pointer.

Fixes: 3205306c6b ("ibmvnic: Update TX pool initialization routine")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:28 +02:00
Sukadev Bhattiprolu
f25accc4fd ibmvnic: set ltb->buff to NULL after freeing
[ Upstream commit 552a33729f1a7cc5115d0752064fe9abd6e3e336 ]

free_long_term_buff() checks ltb->buff to decide whether we have a long
term buffer to free. So set ltb->buff to NULL afer freeing. While here,
also clear ->map_id, fix up some coding style and log an error.

Fixes: 9c4eaabd1b ("Check CRQ command return codes")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:28 +02:00
Dany Madden
3f85d2ca32 Revert "ibmvnic: remove duplicate napi_schedule call in open function"
[ Upstream commit 2ca220f92878470c6ba03f9946e412323093cc94 ]

This reverts commit 7c451f3ef676c805a4b77a743a01a5c21a250a73.

When a vnic interface is taken down and then up, connectivity is not
restored. We bisected it to this commit. Reverting this commit until
we can fully investigate the issue/benefit of the change.

Fixes: 7c451f3ef676 ("ibmvnic: remove duplicate napi_schedule call in open function")
Reported-by: Cristobal Forno <cforno12@linux.ibm.com>
Reported-by: Abdul Haleem <abdhalee@in.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:28 +02:00
Jan Sokolowski
cad22e48ca i40e: Fix missing rtnl locking when setting up pf switch
[ Upstream commit 956e759d5f8e0859e86b951a8779c60af633aafd ]

A recent change that made i40e use new udp_tunnel infrastructure
uses a method that expects to be called under rtnl lock.

However, not all codepaths do the lock prior to calling
i40e_setup_pf_switch.

Fix that by adding additional rtnl locking and unlocking.

Fixes: 40a98cb6f0 ("i40e: convert to new udp_tunnel infrastructure")
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:27 +02:00
Mateusz Palczewski
d9a5d19706 i40e: Fix autoneg disabling for non-10GBaseT links
[ Upstream commit 9262793e59f0423437166a879a73d056b1fe6f9a ]

Disabling autonegotiation was allowed only for 10GBaseT PHY.
The condition was changed to check if link media type is BaseT.

Fixes: 3ce12ee9d8 ("i40e: Fix order of checks when enabling/disabling autoneg in ethtool")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Karen Sornek <karen.sornek@intel.com>
Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:27 +02:00
Dinghao Liu
88819239e9 i40e: Fix error handling in i40e_vsi_open
[ Upstream commit 9c04cfcd4aad232e36306cdc5c74cd9fc9148a7e ]

When vsi->type == I40E_VSI_FDIR, we have caught the return value of
i40e_vsi_request_irq() but without further handling. Check and execute
memory clean on failure just like the other i40e_vsi_request_irq().

Fixes: 8a9eb7d3cb ("i40e: rework fdir setup and teardown")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:27 +02:00
Maciej Żenczykowski
aefa927744 bpf: Do not change gso_size during bpf_skb_change_proto()
[ Upstream commit 364745fbe981a4370f50274475da4675661104df ]

This is technically a backwards incompatible change in behaviour, but I'm
going to argue that it is very unlikely to break things, and likely to fix
*far* more then it breaks.

In no particular order, various reasons follow:

(a) I've long had a bug assigned to myself to debug a super rare kernel crash
on Android Pixel phones which can (per stacktrace) be traced back to BPF clat
IPv6 to IPv4 protocol conversion causing some sort of ugly failure much later
on during transmit deep in the GSO engine, AFAICT precisely because of this
change to gso_size, though I've never been able to manually reproduce it. I
believe it may be related to the particular network offload support of attached
USB ethernet dongle being used for tethering off of an IPv6-only cellular
connection. The reason might be we end up with more segments than max permitted,
or with a GSO packet with only one segment... (either way we break some
assumption and hit a BUG_ON)

(b) There is no check that the gso_size is > 20 when reducing it by 20, so we
might end up with a negative (or underflowing) gso_size or a gso_size of 0.
This can't possibly be good. Indeed this is probably somehow exploitable (or
at least can result in a kernel crash) by delivering crafted packets and perhaps
triggering an infinite loop or a divide by zero... As a reminder: gso_size (MSS)
is related to MTU, but not directly derived from it: gso_size/MSS may be
significantly smaller then one would get by deriving from local MTU. And on
some NICs (which do loose MTU checking on receive, it may even potentially be
larger, for example my work pc with 1500 MTU can receive 1520 byte frames [and
sometimes does due to bugs in a vendor plat46 implementation]). Indeed even just
going from 21 to 1 is potentially problematic because it increases the number
of segments by a factor of 21 (think DoS, or some other crash due to too many
segments).

(c) It's always safe to not increase the gso_size, because it doesn't result in
the max packet size increasing.  So the skb_increase_gso_size() call was always
unnecessary for correctness (and outright undesirable, see later). As such the
only part which is potentially dangerous (ie. could cause backwards compatibility
issues) is the removal of the skb_decrease_gso_size() call.

(d) If the packets are ultimately destined to the local device, then there is
absolutely no benefit to playing around with gso_size. It only matters if the
packets will egress the device. ie. we're either forwarding, or transmitting
from the device.

(e) This logic only triggers for packets which are GSO. It does not trigger for
skbs which are not GSO. It will not convert a non-GSO MTU sized packet into a
GSO packet (and you don't even know what the MTU is, so you can't even fix it).
As such your transmit path must *already* be able to handle an MTU 20 bytes
larger then your receive path (for IPv4 to IPv6 translation) - and indeed 28
bytes larger due to IPv4 fragments. Thus removing the skb_decrease_gso_size()
call doesn't actually increase the size of the packets your transmit side must
be able to handle. ie. to handle non-GSO max-MTU packets, the IPv4/IPv6 device/
route MTUs must already be set correctly. Since for example with an IPv4 egress
MTU of 1500, IPv4 to IPv6 translation will already build 1520 byte IPv6 frames,
so you need a 1520 byte device MTU. This means if your IPv6 device's egress
MTU is 1280, your IPv4 route must be 1260 (and actually 1252, because of the
need to handle fragments). This is to handle normal non-GSO packets. Thus the
reduction is simply not needed for GSO packets, because when they're correctly
built, they will already be the right size.

(f) TSO/GSO should be able to exactly undo GRO: the number of packets (TCP
segments) should not be modified, so that TCP's MSS counting works correctly
(this matters for congestion control). If protocol conversion changes the
gso_size, then the number of TCP segments may increase or decrease. Packet loss
after protocol conversion can result in partial loss of MSS segments that the
sender sent. How's the sending TCP stack going to react to receiving ACKs/SACKs
in the middle of the segments it sent?

(g) skb_{decrease,increase}_gso_size() are already no-ops for GSO_BY_FRAGS
case (besides triggering WARN_ON_ONCE). This means you already cannot guarantee
that gso_size (and thus resulting packet MTU) is changed. ie. you must assume
it won't be changed.

(h) changing gso_size is outright buggy for UDP GSO packets, where framing
matters (I believe that's also the case for SCTP, but it's already excluded
by [g]).  So the only remaining case is TCP, which also doesn't want it
(see [f]).

(i) see also the reasoning on the previous attempt at fixing this
(commit fa7b83bf3b156c767f3e4a25bbf3817b08f3ff8e) which shows that the current
behaviour causes TCP packet loss:

  In the forwarding path GRO -> BPF 6 to 4 -> GSO for TCP traffic, the
  coalesced packet payload can be > MSS, but < MSS + 20.

  bpf_skb_proto_6_to_4() will upgrade the MSS and it can be > the payload
  length. After then tcp_gso_segment checks for the payload length if it
  is <= MSS. The condition is causing the packet to be dropped.

  tcp_gso_segment():
    [...]
    mss = skb_shinfo(skb)->gso_size;
    if (unlikely(skb->len <= mss)) goto out;
    [...]

Thus changing the gso_size is simply a very bad idea. Increasing is unnecessary
and buggy, and decreasing can go negative.

Fixes: 6578171a7f ("bpf: add bpf_skb_change_proto helper")
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dongseok Yi <dseok.yi@samsung.com>
Cc: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/bpf/CANP3RGfjLikQ6dg=YpBU0OeHvyv7JOki7CyOUS9modaXAi-9vQ@mail.gmail.com
Link: https://lore.kernel.org/bpf/20210617000953.2787453-2-zenczykowski@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:27 +02:00
Norbert Slusarek
0dac8b0ad0 can: j1939: j1939_sk_setsockopt(): prevent allocation of j1939 filter for optlen == 0
[ Upstream commit aaf473d0100f64abc88560e2bea905805bcf2a8e ]

If optval != NULL and optlen == 0 are specified for SO_J1939_FILTER in
j1939_sk_setsockopt(), memdup_sockptr() will return ZERO_PTR for 0
size allocation. The new filter will be mistakenly assigned ZERO_PTR.
This patch checks for optlen != 0 and filter will be assigned NULL in
case of optlen == 0.

Fixes: 9d71dd0c70 ("can: add support of SAE J1939 protocol")
Link: https://lore.kernel.org/r/20210620123842.117975-1-nslusarek@gmx.net
Signed-off-by: Norbert Slusarek <nslusarek@gmx.net>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:27 +02:00
Eric Dumazet
2d58a38275 ipv6: exthdrs: do not blindly use init_net
[ Upstream commit bcc3f2a829b9edbe3da5fb117ee5a63686d31834 ]

I see no reason why max_dst_opts_cnt and max_hbh_opts_cnt
are fetched from the initial net namespace.

The other sysctls (max_dst_opts_len & max_hbh_opts_len)
are in fact already using the current ns.

Note: it is not clear why ipv6_destopt_rcv() use two ways to
get to the netns :

 1) dev_net(dst->dev)
    Originally used to increment IPSTATS_MIB_INHDRERRORS

 2) dev_net(skb->dev)
     Tom used this variant in his patch.

Maybe this calls to use ipv6_skb_net() instead ?

Fixes: 47d3d7ac65 ("ipv6: Implement limits on Hop-by-Hop and Destination options")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@quantonium.net>
Cc: Coco Li <lixiaoyan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:27 +02:00
Jian-Hong Pan
b559d003f0 net: bcmgenet: Fix attaching to PYH failed on RPi 4B
[ Upstream commit b2ac9800cfe0f8da16abc4e74e003440361c112e ]

The Broadcom UniMAC MDIO bus from mdio-bcm-unimac module comes too late.
So, GENET cannot find the ethernet PHY on UniMAC MDIO bus. This leads
GENET fail to attach the PHY as following log:

bcmgenet fd580000.ethernet: GENET 5.0 EPHY: 0x0000
...
could not attach to PHY
bcmgenet fd580000.ethernet eth0: failed to connect to PHY
uart-pl011 fe201000.serial: no DMA platform data
libphy: bcmgenet MII bus: probed
...
unimac-mdio unimac-mdio.-19: Broadcom UniMAC MDIO bus

This patch adds the soft dependency to load mdio-bcm-unimac module
before genet module to avoid the issue.

Fixes: 9a4e796970 ("net: bcmgenet: utilize generic Broadcom UniMAC MDIO controller driver")
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=213485
Signed-off-by: Jian-Hong Pan <jhp@endlessos.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:27 +02:00
Ping-Ke Shih
514c96bf65 mac80211: remove iwlwifi specific workaround NDPs of null_response
[ Upstream commit 744757e46bf13ec3a7b3507d17ab3faab9516d43 ]

Remove the remaining workaround that is not removed by the
commit e41eb3e408de ("mac80211: remove iwlwifi specific workaround
that broke sta NDP tx")

Fixes: 41cbb0f5a2 ("mac80211: add support for HE")
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://lore.kernel.org/r/20210623134826.10318-1-pkshih@realtek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:26 +02:00
Zhen Lei
1b3985aa53 drm/msm/dpu: Fix error return code in dpu_mdss_init()
[ Upstream commit e020ac961ce5d038de66dc7f6ffca98899e9a3f3 ]

The error code returned by platform_get_irq() is stored in 'irq', it's
forgotten to be copied to 'ret' before being returned. As a result, the
value 0 of 'ret' is returned incorrectly.

After the above fix is completed, initializing the local variable 'ret'
to 0 is no longer needed, remove it.

In addition, when dpu_mdss_init() is successfully returned, the value of
'ret' is always 0. Therefore, replace "return ret" with "return 0" to make
the code clearer.

Fixes: 070e64dc1b ("drm/msm/dpu: Convert to a chained irq chip")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20210510063805.3262-2-thunder.leizhen@huawei.com
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:26 +02:00
Zhen Lei
134a561aee drm/msm: Fix error return code in msm_drm_init()
[ Upstream commit a1c9b1e3bdd6d8dc43c18699772fb6cf4497d45a ]

Fix to return a negative error code from the error handling case instead
of 0, as done elsewhere in this function.

Fixes: 7f9743abaa ("drm/msm: validate display and event threads")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20210508022836.1777-1-thunder.leizhen@huawei.com
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:26 +02:00
John Fastabend
f97b9c4c07 bpf: Fix null ptr deref with mixed tail calls and subprogs
[ Upstream commit 7506d211b932870155bcb39e3dd9e39fab45a7c7 ]

The sub-programs prog->aux->poke_tab[] is populated in jit_subprogs() and
then used when emitting 'BPF_JMP|BPF_TAIL_CALL' insn->code from the
individual JITs. The poke_tab[] to use is stored in the insn->imm by
the code adding it to that array slot. The JIT then uses imm to find the
right entry for an individual instruction. In the x86 bpf_jit_comp.c
this is done by calling emit_bpf_tail_call_direct with the poke_tab[]
of the imm value.

However, we observed the below null-ptr-deref when mixing tail call
programs with subprog programs. For this to happen we just need to
mix bpf-2-bpf calls and tailcalls with some extra calls or instructions
that would be patched later by one of the fixup routines. So whats
happening?

Before the fixup_call_args() -- where the jit op is done -- various
code patching is done by do_misc_fixups(). This may increase the
insn count, for example when we patch map_lookup_up using map_gen_lookup
hook. This does two things. First, it means the instruction index,
insn_idx field, of a tail call instruction will move by a 'delta'.

In verifier code,

 struct bpf_jit_poke_descriptor desc = {
  .reason = BPF_POKE_REASON_TAIL_CALL,
  .tail_call.map = BPF_MAP_PTR(aux->map_ptr_state),
  .tail_call.key = bpf_map_key_immediate(aux),
  .insn_idx = i + delta,
 };

Then subprog start values subprog_info[i].start will be updated
with the delta and any poke descriptor index will also be updated
with the delta in adjust_poke_desc(). If we look at the adjust
subprog starts though we see its only adjusted when the delta
occurs before the new instructions,

        /* NOTE: fake 'exit' subprog should be updated as well. */
        for (i = 0; i <= env->subprog_cnt; i++) {
                if (env->subprog_info[i].start <= off)
                        continue;

Earlier subprograms are not changed because their start values
are not moved. But, adjust_poke_desc() does the offset + delta
indiscriminately. The result is poke descriptors are potentially
corrupted.

Then in jit_subprogs() we only populate the poke_tab[]
when the above insn_idx is less than the next subprogram start. From
above we corrupted our insn_idx so we might incorrectly assume a
poke descriptor is not used in a subprogram omitting it from the
subprogram. And finally when the jit runs it does the deref of poke_tab
when emitting the instruction and crashes with below. Because earlier
step omitted the poke descriptor.

The fix is straight forward with above context. Simply move same logic
from adjust_subprog_starts() into adjust_poke_descs() and only adjust
insn_idx when needed.

[   82.396354] bpf_testmod: version magic '5.12.0-rc2alu+ SMP preempt mod_unload ' should be '5.12.0+ SMP preempt mod_unload '
[   82.623001] loop10: detected capacity change from 0 to 8
[   88.487424] ==================================================================
[   88.487438] BUG: KASAN: null-ptr-deref in do_jit+0x184a/0x3290
[   88.487455] Write of size 8 at addr 0000000000000008 by task test_progs/5295
[   88.487471] CPU: 7 PID: 5295 Comm: test_progs Tainted: G          I       5.12.0+ #386
[   88.487483] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019
[   88.487490] Call Trace:
[   88.487498]  dump_stack+0x93/0xc2
[   88.487515]  kasan_report.cold+0x5f/0xd8
[   88.487530]  ? do_jit+0x184a/0x3290
[   88.487542]  do_jit+0x184a/0x3290
 ...
[   88.487709]  bpf_int_jit_compile+0x248/0x810
 ...
[   88.487765]  bpf_check+0x3718/0x5140
 ...
[   88.487920]  bpf_prog_load+0xa22/0xf10

Fixes: a748c6975d ("bpf: propagate poke descriptors to subprograms")
Reported-by: Jussi Maki <joamaki@gmail.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:26 +02:00
Eric Dumazet
56c31bc9aa ieee802154: hwsim: avoid possible crash in hwsim_del_edge_nl()
[ Upstream commit 0303b30375dff5351a79cc2c3c87dfa4fda29bed ]

Both MAC802154_HWSIM_ATTR_RADIO_ID and MAC802154_HWSIM_ATTR_RADIO_EDGE
must be present to avoid a crash.

Fixes: f25da51fdc ("ieee802154: hwsim: add replacement for fakelb")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: Stefan Schmidt <stefan@datenfreihafen.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
Link: https://lore.kernel.org/r/20210621180244.882076-1-eric.dumazet@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:26 +02:00
Dongliang Mu
71a345ede5 ieee802154: hwsim: Fix memory leak in hwsim_add_one
[ Upstream commit 28a5501c3383f0e6643012c187b7c2027ef42aea ]

No matter from hwsim_remove or hwsim_del_radio_nl, hwsim_del fails to
remove the entry in the edges list. Take the example below, phy0, phy1
and e0 will be deleted, resulting in e1 not freed and accessed in the
future.

              hwsim_phys
                  |
    ------------------------------
    |                            |
phy0 (edges)                 phy1 (edges)
   ----> e1 (idx = 1)             ----> e0 (idx = 0)

Fix this by deleting and freeing all the entries in the edges list
between hwsim_edge_unsubscribe_me and list_del(&phy->list).

Reported-by: syzbot+b80c9959009a9325cdff@syzkaller.appspotmail.com
Fixes: 1c9f4a3fce ("ieee802154: hwsim: fix rcu handling")
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Acked-by: Alexander Aring <aahringo@redhat.com>
Link: https://lore.kernel.org/r/20210616020901.2759466-1-mudongliangabcd@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:26 +02:00
Marcelo Ricardo Leitner
4b44486b8b tc-testing: fix list handling
[ Upstream commit b4fd096cbb871340be837491fa1795864a48b2d9 ]

python lists don't have an 'add' method, but 'append'.

Fixes: 14e5175e9e ("tc-testing: introduce scapyPlugin for basic traffic")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:26 +02:00
Vignesh Raghavendra
997285646a net: ti: am65-cpsw-nuss: Fix crash when changing number of TX queues
[ Upstream commit ce8eb4c728ef40b554b4f3d8963f11ed44502e00 ]

When changing number of TX queues using ethtool:

	# ethtool -L eth0 tx 1
	[  135.301047] Unable to handle kernel paging request at virtual address 00000000af5d0000
	[...]
	[  135.525128] Call trace:
	[  135.525142]  dma_release_from_dev_coherent+0x2c/0xb0
	[  135.525148]  dma_free_attrs+0x54/0xe0
	[  135.525156]  k3_cppi_desc_pool_destroy+0x50/0xa0
	[  135.525164]  am65_cpsw_nuss_remove_tx_chns+0x88/0xdc
	[  135.525171]  am65_cpsw_set_channels+0x3c/0x70
	[...]

This is because k3_cppi_desc_pool_destroy() which is called after
k3_udma_glue_release_tx_chn() in am65_cpsw_nuss_remove_tx_chns()
references struct device that is unregistered at the end of
k3_udma_glue_release_tx_chn()

Therefore the right order is to call k3_cppi_desc_pool_destroy() and
destroy desc pool before calling k3_udma_glue_release_tx_chn().
Fix this throughout the driver.

Fixes: 93a7653031 ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:26 +02:00
Miao Wang
6610d5a73b net/ipv4: swap flow ports when validating source
[ Upstream commit c69f114d09891adfa3e301a35d9e872b8b7b5a50 ]

When doing source address validation, the flowi4 struct used for
fib_lookup should be in the reverse direction to the given skb.
fl4_dport and fl4_sport returned by fib4_rules_early_flow_dissect
should thus be swapped.

Fixes: 5a847a6e14 ("net/ipv4: Initialize proto and ports in flow struct")
Signed-off-by: Miao Wang <shankerwangmiao@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:25 +02:00
Jakub Kicinski
c3fcfc4e36 ip6_tunnel: fix GRE6 segmentation
[ Upstream commit a6e3f2985a80ef6a45a17d2d9d9151f17ea3ce07 ]

Commit 6c11fbf97e ("ip6_tunnel: add MPLS transmit support")
moved assiging inner_ipproto down from ipxip6_tnl_xmit() to
its callee ip6_tnl_xmit(). The latter is also used by GRE.

Since commit 3872035241 ("gre: Use inner_proto to obtain inner
header protocol") GRE had been depending on skb->inner_protocol
during segmentation. It sets it in gre_build_header() and reads
it in gre_gso_segment(). Changes to ip6_tnl_xmit() overwrite
the protocol, resulting in GSO skbs getting dropped.

Note that inner_protocol is a union with inner_ipproto,
GRE uses the former while the change switched it to the latter
(always setting it to just IPPROTO_GRE).

Restore the original location of skb_set_inner_ipproto(),
it is unclear why it was moved in the first place.

Fixes: 6c11fbf97e ("ip6_tunnel: add MPLS transmit support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:25 +02:00
Eric Dumazet
162e75687e vxlan: add missing rcu_read_lock() in neigh_reduce()
[ Upstream commit 85e8b032d6ebb0f698a34dd22c2f13443d905888 ]

syzbot complained in neigh_reduce(), because rcu_read_lock_bh()
is treated differently than rcu_read_lock()

WARNING: suspicious RCU usage
5.13.0-rc6-syzkaller #0 Not tainted
-----------------------------
include/net/addrconf.h:313 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
3 locks held by kworker/0:0/5:
 #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
 #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
 #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:617 [inline]
 #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
 #0: ffff888011064d38 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x871/0x1600 kernel/workqueue.c:2247
 #1: ffffc90000ca7da8 ((work_completion)(&port->wq)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1600 kernel/workqueue.c:2251
 #2: ffffffff8bf795c0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x1da/0x3130 net/core/dev.c:4180

stack backtrace:
CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.13.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events ipvlan_process_multicast
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x141/0x1d7 lib/dump_stack.c:120
 __in6_dev_get include/net/addrconf.h:313 [inline]
 __in6_dev_get include/net/addrconf.h:311 [inline]
 neigh_reduce drivers/net/vxlan.c:2167 [inline]
 vxlan_xmit+0x34d5/0x4c30 drivers/net/vxlan.c:2919
 __netdev_start_xmit include/linux/netdevice.h:4944 [inline]
 netdev_start_xmit include/linux/netdevice.h:4958 [inline]
 xmit_one net/core/dev.c:3654 [inline]
 dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3670
 __dev_queue_xmit+0x2133/0x3130 net/core/dev.c:4246
 ipvlan_process_multicast+0xa99/0xd70 drivers/net/ipvlan/ipvlan_core.c:287
 process_one_work+0x98d/0x1600 kernel/workqueue.c:2276
 worker_thread+0x64c/0x1120 kernel/workqueue.c:2422
 kthread+0x3b1/0x4a0 kernel/kthread.c:313
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Fixes: f564f45c45 ("vxlan: add ipv6 proxy support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:25 +02:00
Po-Hao Huang
6cd23b5f40 rtw88: 8822c: fix lc calibration timing
[ Upstream commit 05684fd583e1acc34dddea283838fbfbed4904a0 ]

Before this patch, we use value from 2 seconds ago to decide
whether we should do lc calibration.
Although this don't happen frequently, fix flow to the way it should be.

Fixes: 7ae7784ec2a8 ("rtw88: 8822c: add LC calibration for RTL8822C")
Signed-off-by: Po-Hao Huang <phhuang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210426013252.5665-3-pkshih@realtek.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:25 +02:00
Luca Coelho
db2386fa43 iwlwifi: increase PNVM load timeout
[ Upstream commit 5cc816ef9db1fe03f73e56e9d8f118add9c6efe4 ]

The FW has a watchdog of 200ms in the PNVM load flow, so the driver
should have a slightly higher timeout.  Change the timeout from 100ms
to 250ms.

Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Fixes: 70d3ca86b0 ("iwlwifi: mvm: ring the doorbell and wait for PNVM load completion")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20210612142637.ba22aec1e2be.I36bfadc28c480f4fc57266c075a79e8ea4a6934f@changeid
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:25 +02:00
Ayush Sawal
78e6587585 xfrm: Fix xfrm offload fallback fail case
[ Upstream commit dd72fadf2186fc8a6018f97fe72f4d5ca05df440 ]

In case of xfrm offload, if xdo_dev_state_add() of driver returns
-EOPNOTSUPP, xfrm offload fallback is failed.
In xfrm state_add() both xso->dev and xso->real_dev are initialized to
dev and when err(-EOPNOTSUPP) is returned only xso->dev is set to null.

So in this scenario the condition in func validate_xmit_xfrm(),
if ((x->xso.dev != dev) && (x->xso.real_dev == dev))
                return skb;
returns true, due to which skb is returned without calling esp_xmit()
below which has fallback code. Hence the CRYPTO_FALLBACK is failing.

So fixing this with by keeping x->xso.real_dev as NULL when err is
returned in func xfrm_dev_state_add().

Fixes: bdfd2d1fa7 ("bonding/xfrm: use real_dev instead of slave_dev")
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:25 +02:00
Eric Dumazet
5c8e5fecea pkt_sched: sch_qfq: fix qfq_change_class() error path
[ Upstream commit 0cd58e5c53babb9237b741dbef711f0a9eb6d3fd ]

If qfq_change_class() is unable to allocate memory for qfq_aggregate,
it frees the class that has been inserted in the class hash table,
but does not unhash it.

Defer the insertion after the problematic allocation.

BUG: KASAN: use-after-free in hlist_add_head include/linux/list.h:884 [inline]
BUG: KASAN: use-after-free in qdisc_class_hash_insert+0x200/0x210 net/sched/sch_api.c:731
Write of size 8 at addr ffff88814a534f10 by task syz-executor.4/31478

CPU: 0 PID: 31478 Comm: syz-executor.4 Not tainted 5.13.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x141/0x1d7 lib/dump_stack.c:120
 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233
 __kasan_report mm/kasan/report.c:419 [inline]
 kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436
 hlist_add_head include/linux/list.h:884 [inline]
 qdisc_class_hash_insert+0x200/0x210 net/sched/sch_api.c:731
 qfq_change_class+0x96c/0x1990 net/sched/sch_qfq.c:489
 tc_ctl_tclass+0x514/0xe50 net/sched/sch_api.c:2113
 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5564
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929
 sock_sendmsg_nosec net/socket.c:654 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:674
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665d9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fdc7b5f0188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000000056bf80 RCX: 00000000004665d9
RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003
RBP: 00007fdc7b5f01d0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007ffcf7310b3f R14: 00007fdc7b5f0300 R15: 0000000000022000

Allocated by task 31445:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
 kasan_set_track mm/kasan/common.c:46 [inline]
 set_alloc_info mm/kasan/common.c:428 [inline]
 ____kasan_kmalloc mm/kasan/common.c:507 [inline]
 ____kasan_kmalloc mm/kasan/common.c:466 [inline]
 __kasan_kmalloc+0x9b/0xd0 mm/kasan/common.c:516
 kmalloc include/linux/slab.h:556 [inline]
 kzalloc include/linux/slab.h:686 [inline]
 qfq_change_class+0x705/0x1990 net/sched/sch_qfq.c:464
 tc_ctl_tclass+0x514/0xe50 net/sched/sch_api.c:2113
 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5564
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929
 sock_sendmsg_nosec net/socket.c:654 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:674
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 31445:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
 kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357
 ____kasan_slab_free mm/kasan/common.c:360 [inline]
 ____kasan_slab_free mm/kasan/common.c:325 [inline]
 __kasan_slab_free+0xfb/0x130 mm/kasan/common.c:368
 kasan_slab_free include/linux/kasan.h:212 [inline]
 slab_free_hook mm/slub.c:1583 [inline]
 slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1608
 slab_free mm/slub.c:3168 [inline]
 kfree+0xe5/0x7f0 mm/slub.c:4212
 qfq_change_class+0x10fb/0x1990 net/sched/sch_qfq.c:518
 tc_ctl_tclass+0x514/0xe50 net/sched/sch_api.c:2113
 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5564
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929
 sock_sendmsg_nosec net/socket.c:654 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:674
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
 entry_SYSCALL_64_after_hwframe+0x44/0xae

The buggy address belongs to the object at ffff88814a534f00
 which belongs to the cache kmalloc-128 of size 128
The buggy address is located 16 bytes inside of
 128-byte region [ffff88814a534f00, ffff88814a534f80)
The buggy address belongs to the page:
page:ffffea0005294d00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x14a534
flags: 0x57ff00000000200(slab|node=1|zone=2|lastcpupid=0x7ff)
raw: 057ff00000000200 ffffea00004fee00 0000000600000006 ffff8880110418c0
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 29797, ts 604817765317, free_ts 604810151744
 prep_new_page mm/page_alloc.c:2358 [inline]
 get_page_from_freelist+0x1033/0x2b60 mm/page_alloc.c:3994
 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5200
 alloc_pages+0x18c/0x2a0 mm/mempolicy.c:2272
 alloc_slab_page mm/slub.c:1646 [inline]
 allocate_slab+0x2c5/0x4c0 mm/slub.c:1786
 new_slab mm/slub.c:1849 [inline]
 new_slab_objects mm/slub.c:2595 [inline]
 ___slab_alloc+0x4a1/0x810 mm/slub.c:2758
 __slab_alloc.constprop.0+0xa7/0xf0 mm/slub.c:2798
 slab_alloc_node mm/slub.c:2880 [inline]
 slab_alloc mm/slub.c:2922 [inline]
 __kmalloc+0x315/0x330 mm/slub.c:4050
 kmalloc include/linux/slab.h:561 [inline]
 kzalloc include/linux/slab.h:686 [inline]
 __register_sysctl_table+0x112/0x1090 fs/proc/proc_sysctl.c:1318
 mpls_dev_sysctl_register+0x1b7/0x2d0 net/mpls/af_mpls.c:1421
 mpls_add_dev net/mpls/af_mpls.c:1472 [inline]
 mpls_dev_notify+0x214/0x8b0 net/mpls/af_mpls.c:1588
 notifier_call_chain+0xb5/0x200 kernel/notifier.c:83
 call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:2121
 call_netdevice_notifiers_extack net/core/dev.c:2133 [inline]
 call_netdevice_notifiers net/core/dev.c:2147 [inline]
 register_netdevice+0x106b/0x1500 net/core/dev.c:10312
 veth_newlink+0x585/0xac0 drivers/net/veth.c:1547
 __rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3452
 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3500
page last free stack trace:
 reset_page_owner include/linux/page_owner.h:24 [inline]
 free_pages_prepare mm/page_alloc.c:1298 [inline]
 free_pcp_prepare+0x223/0x300 mm/page_alloc.c:1342
 free_unref_page_prepare mm/page_alloc.c:3250 [inline]
 free_unref_page+0x12/0x1d0 mm/page_alloc.c:3298
 __vunmap+0x783/0xb60 mm/vmalloc.c:2566
 free_work+0x58/0x70 mm/vmalloc.c:80
 process_one_work+0x98d/0x1600 kernel/workqueue.c:2276
 worker_thread+0x64c/0x1120 kernel/workqueue.c:2422
 kthread+0x3b1/0x4a0 kernel/kthread.c:313
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Memory state around the buggy address:
 ffff88814a534e00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88814a534e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88814a534f00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                         ^
 ffff88814a534f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88814a535000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Fixes: 462dbc9101 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:25 +02:00
Pablo Neira Ayuso
b2ce4ebdd9 netfilter: nf_tables_offload: check FLOW_DISSECTOR_KEY_BASIC in VLAN transfer logic
[ Upstream commit ea45fdf82cc90430bb7c280e5e53821e833782c5 ]

The VLAN transfer logic should actually check for
FLOW_DISSECTOR_KEY_BASIC, not FLOW_DISSECTOR_KEY_CONTROL. Moreover, do
not fallback to case 2) .n_proto is set to 802.1q or 802.1ad, if
FLOW_DISSECTOR_KEY_BASIC is unset.

Fixes: 783003f3bb8a ("netfilter: nftables_offload: special ethertype handling for VLAN")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:24 +02:00
Jakub Kicinski
581e37ad5c tls: prevent oversized sendfile() hangs by ignoring MSG_MORE
[ Upstream commit d452d48b9f8b1a7f8152d33ef52cfd7fe1735b0a ]

We got multiple reports that multi_chunk_sendfile test
case from tls selftest fails. This was sort of expected,
as the original fix was never applied (see it in the first
Link:). The test in question uses sendfile() with count
larger than the size of the underlying file. This will
make splice set MSG_MORE on all sendpage calls, meaning
TLS will never close and flush the last partial record.

Eric seem to have addressed a similar problem in
commit 35f9c09fe9 ("tcp: tcp_sendpages() should call tcp_push() once")
by introducing MSG_SENDPAGE_NOTLAST. Unlike MSG_MORE
MSG_SENDPAGE_NOTLAST is not set on the last call
of a "pipefull" of data (PIPE_DEF_BUFFERS == 16,
so every 16 pages or whenever we run out of data).

Having a break every 16 pages should be fine, TLS
can pack exactly 4 pages into a record, so for
aligned reads there should be no difference,
unaligned may see one extra record per sendpage().

Sticking to TCP semantics seems preferable to modifying
splice, but we can revisit it if real life scenarios
show a regression.

Reported-by: Vadim Fedorenko <vfedorenko@novek.ru>
Reported-by: Seth Forshee <seth.forshee@canonical.com>
Link: https://lore.kernel.org/netdev/1591392508-14592-1-git-send-email-pooja.trivedi@stackpath.com/
Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:24 +02:00
Yunsheng Lin
e7c3ae4797 net: sched: add barrier to ensure correct ordering for lockless qdisc
[ Upstream commit 89837eb4b2463c556a123437f242d6c2bc62ce81 ]

The spin_trylock() was assumed to contain the implicit
barrier needed to ensure the correct ordering between
STATE_MISSED setting/clearing and STATE_MISSED checking
in commit a90c57f2cedd ("net: sched: fix packet stuck
problem for lockless qdisc").

But it turns out that spin_trylock() only has load-acquire
semantic, for strongly-ordered system(like x86), the compiler
barrier implicitly contained in spin_trylock() seems enough
to ensure the correct ordering. But for weakly-orderly system
(like arm64), the store-release semantic is needed to ensure
the correct ordering as clear_bit() and test_bit() is store
operation, see queued_spin_lock().

So add the explicit barrier to ensure the correct ordering
for the above case.

Fixes: a90c57f2cedd ("net: sched: fix packet stuck problem for lockless qdisc")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:24 +02:00
Antoine Tenart
ca9b5ab791 vrf: do not push non-ND strict packets with a source LLA through packet taps again
[ Upstream commit 603113c514e95c3350598bc3cccbd03af7ea4ab2 ]

Non-ND strict packets with a source LLA go through the packet taps
again, while non-ND strict packets with other source addresses do not,
and we can see a clone of those packets on the vrf interface (we should
not). This is due to a series of changes:

Commit 6f12fa775530[1] made non-ND strict packets not being pushed again
in the packet taps. This changed with commit 205704c618af[2] for those
packets having a source LLA, as they need a lookup with the orig_iif.

The issue now is those packets do not skip the 'vrf_ip6_rcv' function to
the end (as the ones without a source LLA) and go through the check to
call packet taps again. This check was changed by commit 6f12fa775530[1]
and do not exclude non-strict packets anymore. Packets matching
'need_strict && !is_ndisc && is_ll_src' are now being sent through the
packet taps again. This can be seen by dumping packets on the vrf
interface.

Fix this by having the same code path for all non-ND strict packets and
selectively lookup with the orig_iif for those with a source LLA. This
has the effect to revert to the pre-205704c618af[2] condition, which
should also be easier to maintain.

[1] 6f12fa7755 ("vrf: mark skb for multicast or link-local as enslaved to VRF")
[2] 205704c618 ("vrf: packets with lladdr src needs dst at input with orig_iif when needs strict")

Fixes: 205704c618 ("vrf: packets with lladdr src needs dst at input with orig_iif when needs strict")
Cc: Stephen Suryaputra <ssuryaextr@gmail.com>
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:24 +02:00
Pavel Skripkin
e72d9e4b98 net: ethernet: ezchip: fix error handling
[ Upstream commit 0de449d599594f5472e00267d651615c7f2c6c1d ]

As documented at drivers/base/platform.c for platform_get_irq:

 * Gets an IRQ for a platform device and prints an error message if finding the
 * IRQ fails. Device drivers should check the return value for errors so as to
 * not pass a negative integer value to the request_irq() APIs.

So, the driver should check that platform_get_irq() return value
is _negative_, not that it's equal to zero, because -ENXIO (return
value from request_irq() if irq was not found) will
pass this check and it leads to passing negative irq to request_irq()

Fixes: 0dd0770936 ("NET: Add ezchip ethernet driver")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:24 +02:00
Pavel Skripkin
161f8b73da net: ethernet: ezchip: fix UAF in nps_enet_remove
[ Upstream commit e4b8700e07a86e8eab6916aa5c5ba99042c34089 ]

priv is netdev private data, but it is used
after free_netdev(). It can cause use-after-free when accessing priv
pointer. So, fix it by moving free_netdev() after netif_napi_del()
call.

Fixes: 0dd0770936 ("NET: Add ezchip ethernet driver")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:24 +02:00
Pavel Skripkin
f026d82211 net: ethernet: aeroflex: fix UAF in greth_of_remove
[ Upstream commit e3a5de6d81d8b2199935c7eb3f7d17a50a7075b7 ]

static int greth_of_remove(struct platform_device *of_dev)
{
...
	struct greth_private *greth = netdev_priv(ndev);
...
	unregister_netdev(ndev);
	free_netdev(ndev);

	of_iounmap(&of_dev->resource[0], greth->regs, resource_size(&of_dev->resource[0]));
...
}

greth is netdev private data, but it is used
after free_netdev(). It can cause use-after-free when accessing greth
pointer. So, fix it by moving free_netdev() after of_iounmap()
call.

Fixes: d4c41139df ("net: Add Aeroflex Gaisler 10/100/1G Ethernet MAC driver")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:24 +02:00
Lorenzo Bianconi
6a8c7c5c07 mt76: mt7615: fix NULL pointer dereference in tx_prepare_skb()
[ Upstream commit 8d3cdc1bbb1d355f0ebef973175ae5fd74286feb ]

Fix theoretical NULL pointer dereference in mt7615_tx_prepare_skb and
mt7663_usb_sdio_tx_prepare_skb routines. This issue has been identified
by code analysis.

Fixes: 6aa4ed7927 ("mt76: mt7615: implement DMA support for MT7622")
Fixes: 4bb586bc33 ("mt76: mt7663u: sync probe sampling with rate configuration")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:24 +02:00
Lorenzo Bianconi
6987ee9bf0 mt76: fix possible NULL pointer dereference in mt76_tx
[ Upstream commit d7400a2f3e295b8cee692c7a66e10f60015a3c37 ]

Even if this is not a real issue since mt76_tx is never run with wcid set
to NULL, fix a theoretical NULL pointer dereference in mt76_tx routine

Fixes: db9f11d343 ("mt76: store wcid tx rate info in one u32 reduce locking")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:23 +02:00
Wang Hai
e717f974ce samples/bpf: Fix the error return code of xdp_redirect's main()
[ Upstream commit 7c6090ee2a7b3315410cfc83a94c3eb057407b25 ]

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

If bpf_map_update_elem() failed, main() should return a negative error.

Fixes: 832622e6bd ("xdp: sample program for new bpf_redirect helper")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210616042534.315097-1-wanghai38@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:23 +02:00
Wang Hai
79bf8f04ce samples/bpf: Fix Segmentation fault for xdp_redirect command
[ Upstream commit 85102ba58b4125ebad941d7555c3c248b23efd16 ]

A Segmentation fault error is caused when the following command
is executed.

$ sudo ./samples/bpf/xdp_redirect lo
Segmentation fault

This command is missing a device <IFNAME|IFINDEX> as an argument, resulting
in out-of-bounds access from argv.

If the number of devices for the xdp_redirect parameter is not 2,
we should report an error and exit.

Fixes: 24251c2647 ("samples/bpf: add option for native and skb mode for redirect apps")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210616042324.314832-1-wanghai38@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:23 +02:00
Jack Wang
fcd8d6371a RDMA/rtrs-srv: Set minimal max_send_wr and max_recv_wr
[ Upstream commit 5e91eabf66c854f16ca2e954e5c68939bc81601e ]

Currently rtrs when create_qp use a coarse numbers (bigger in general),
which leads to hardware create more resources which only waste memory with
no benefits.

For max_send_wr, we don't really need alway max_qp_wr size when creating
qp, reduce it to cq_size.

For max_recv_wr,  cq_size is enough.

With the patch when sess_queue_depth=128, per session (2 paths) memory
consumption reduced from 188 MB to 65MB

When always_invalidate is enabled, we need send more wr, so treat it
special.

Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20210614090337.29557-2-jinpu.wang@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:23 +02:00
Tony Ambardar
a9355b201d bpf: Fix libelf endian handling in resolv_btfids
[ Upstream commit 61e8aeda9398925f8c6fc290585bdd9727d154c4 ]

The vmlinux ".BTF_ids" ELF section is declared in btf_ids.h to hold a list
of zero-filled BTF IDs, which is then patched at link-time with correct
values by resolv_btfids. The section is flagged as "allocable" to preclude
compression, but notably the section contents (BTF IDs) are untyped.

When patching the BTF IDs, resolve_btfids writes in host-native endianness
and relies on libelf for any required translation on reading and updating
vmlinux. However, since the type of the .BTF_ids section content defaults
to ELF_T_BYTE (i.e. unsigned char), no translation occurs. This results in
incorrect patched values when cross-compiling to non-native endianness,
and can manifest as kernel Oops and test failures which are difficult to
troubleshoot [1].

Explicitly set the type of patched data to ELF_T_WORD, the architecture-
neutral ELF type corresponding to the u32 BTF IDs. This enables libelf to
transparently perform any needed endian conversions.

Fixes: fbbb68de80 ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object")
Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Frank Eigler <fche@redhat.com>
Cc: Mark Wielaard <mark@klomp.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/CAPGftE_eY-Zdi3wBcgDfkz_iOr1KF10n=9mJHm1_a_PykcsoeA@mail.gmail.com [1]
Link: https://lore.kernel.org/bpf/20210618061404.818569-1-Tony.Ambardar@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:23 +02:00
Magnus Karlsson
607706027c xsk: Fix broken Tx ring validation
[ Upstream commit f654fae47e83e56b454fbbfd0af0a4f232e356d6 ]

Fix broken Tx ring validation for AF_XDP. The commit under the Fixes
tag, fixed an off-by-one error in the validation but introduced
another error. Descriptors are now let through even if they straddle a
chunk boundary which they are not allowed to do in aligned mode. Worse
is that they are let through even if they straddle the end of the umem
itself, tricking the kernel to read data outside the allowed umem
region which might or might not be mapped at all.

Fix this by reintroducing the old code, but subtract the length by one
to fix the off-by-one error that the original patch was
addressing. The test chunk != chunk_end makes sure packets do not
straddle chunk boundraries. Note that packets of zero length are
allowed in the interface, therefore the test if the length is
non-zero.

Fixes: ac31565c2193 ("xsk: Fix for xp_aligned_validate_desc() when len == chunk_size")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20210618075805.14412-1-magnus.karlsson@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:23 +02:00
Magnus Karlsson
cd7877a39f xsk: Fix missing validation for skb and unaligned mode
[ Upstream commit 2f99619820c2269534eb2c0cde44870313c6d353 ]

Fix a missing validation of a Tx descriptor when executing in skb mode
and the umem is in unaligned mode. A descriptor could point to a
buffer straddling the end of the umem, thus effectively tricking the
kernel to read outside the allowed umem region. This could lead to a
kernel crash if that part of memory is not mapped.

In zero-copy mode, the descriptor validation code rejects such
descriptors by checking a bit in the DMA address that tells us if the
next page is physically contiguous or not. For the last page in the
umem, this bit is not set, therefore any descriptor pointing to a
packet straddling this last page boundary will be rejected. However,
the skb path does not use this bit since it copies out data and can do
so to two different pages. (It also does not have the array of DMA
address, so it cannot even store this bit.) The code just returned
that the packet is always physically contiguous. But this is
unfortunately also returned for the last page in the umem, which means
that packets that cross the end of the umem are being allowed, which
they should not be.

Fix this by introducing a check for this in the SKB path only, not
penalizing the zero-copy path.

Fixes: 2b43470add ("xsk: Introduce AF_XDP buffer allocation API")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20210617092255.3487-1-magnus.karlsson@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:23 +02:00
Daniel Xu
89621945b6 selftests/bpf: Whitelist test_progs.h from .gitignore
[ Upstream commit 809ed84de8b3f2fd7b1d06efb94bf98fd318a7d7 ]

Somehow test_progs.h was being included by the existing rule:

    /test_progs*

This is bad because:

    1) test_progs.h is a checked in file
    2) grep-like tools like ripgrep[0] respect gitignore and
       test_progs.h was being hidden from searches

[0]: https://github.com/BurntSushi/ripgrep

Fixes: 74b5a5968f ("selftests/bpf: Replace test_progs and test_maps w/ general rule")
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/a46f64944bf678bc652410ca6028d3450f4f7f4b.1623880296.git.dxu@dxuuu.xyz
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14 16:56:23 +02:00