Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2019-08-24
An update from ieee802154 for your *net* tree.
Yue Haibing fixed two bugs discovered by KASAN in the hwsim driver for
ieee802154 and Colin Ian King cleaned up a redundant variable assignment.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2019-08-24
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix verifier precision tracking with BPF-to-BPF calls, from Alexei.
2) Fix a use-after-free in prog symbol exposure, from Daniel.
3) Several s390x JIT fixes plus BE related fixes in BPF kselftests, from Ilya.
4) Fix memory leak by unpinning XDP umem pages in error path, from Ivan.
5) Fix a potential use-after-free on flow dissector detach, from Jakub.
6) Fix bpftool to close prog fd after showing metadata, from Quentin.
7) BPF kselftest config and TEST_PROGS_EXTENDED fixes, from Anders.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
test_select_reuseport fails on s390 due to verifier rejecting
test_select_reuseport_kern.o with the following message:
; data_check.eth_protocol = reuse_md->eth_protocol;
18: (69) r1 = *(u16 *)(r6 +22)
invalid bpf_context access off=22 size=2
This is because on big-endian machines casts from __u32 to __u16 are
generated by referencing the respective variable as __u16 with an offset
of 2 (as opposed to 0 on little-endian machines).
The verifier already has all the infrastructure in place to allow such
accesses, it's just that they are not explicitly enabled for
eth_protocol field. Enable them for eth_protocol field by using
bpf_ctx_range instead of offsetof.
Ditto for ip_protocol, bind_inany and len, since they already allow
narrowing, and the same problem can arise when working with them.
Fixes: 2dbb9b9e6d ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Call to bpf_prog_put(), with help of call_rcu(), queues an RCU-callback to
free the program once a grace period has elapsed. The callback can run
together with new RCU readers that started after the last grace period.
New RCU readers can potentially see the "old" to-be-freed or already-freed
pointer to the program object before the RCU update-side NULLs it.
Reorder the operations so that the RCU update-side resets the protected
pointer before the end of the grace period after which the program will be
freed.
Fixes: d58e468b11 ("flow_dissector: implements flow dissector BPF hook")
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Petar Penkov <ppenkov@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Currently, ipv6_find_idev returns NULL when ipv6_add_dev fails,
ignoring the specific error value. This results in addrconf_add_dev
returning ENOBUFS in all cases, which is unfortunate in cases such as:
# ip link add dummyX type dummy
# ip link set dummyX mtu 1200 up
# ip addr add 2000::/64 dev dummyX
RTNETLINK answers: No buffer space available
Commit a317a2f19d ("ipv6: fail early when creating netdev named all
or default") introduced error returns in ipv6_add_dev. Before that,
that function would simply return NULL for all failures.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The request coming from Netlink should use the OEM generic handler.
The standard command handler expects payload in bytes/words/dwords
but the actual payload is stored in data if the request is coming from Netlink.
Signed-off-by: Justin Lee <justin.lee1@dell.com>
Reviewed-by: Vijay Khemka <vijaykhemka@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
it expects a unsigned int, but got a __be32
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 93a714d6b5 ("multicast: Extend ip address command to enable
multicast group join/leave on") we added a new flag IFA_F_MCAUTOJOIN
to make user able to add multicast address on ethernet interface.
This works for IPv4, but not for IPv6. See the inet6_addr_add code.
static int inet6_addr_add()
{
...
if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) {
ipv6_mc_config(net->ipv6.mc_autojoin_sk, true...)
}
ifp = ipv6_add_addr(idev, cfg, true, extack); <- always fail with maddr
if (!IS_ERR(ifp)) {
...
} else if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) {
ipv6_mc_config(net->ipv6.mc_autojoin_sk, false...)
}
}
But in ipv6_add_addr() it will check the address type and reject multicast
address directly. So this feature is never worked for IPv6.
We should not remove the multicast address check totally in ipv6_add_addr(),
but could accept multicast address only when IFA_F_MCAUTOJOIN flag supplied.
v2: update commit description
Fixes: 93a714d6b5 ("multicast: Extend ip address command to enable multicast group join/leave on")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- fix uninit-value in batadv_netlink_get_ifindex(), by Eric Dumazet
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl1dRtEWHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoeN6D/9IExhEX0OZQFf6fkHJ71yKAN0j
FjBBmumtu33brjFCtieix+jBRGny9qAO80I2q3X2yRQlx/SpHVuXUyHPrN+azNMJ
w2mb7Mw1l3+Y4KbdhM1dz+CTafmdJzgZoFsKIAamD+IP5uIUg1sSL4mLcgka35ol
GFZa82LE7hjTY1sR0azKCbbeAIlwMkoLWy5YsUmIDzJZ6eWJSEt/kEuNIWuhIhl4
VZeT6VH96xWSY1rqwgIiv0fXBuiE/n3//nMY5WquM0yFA1XJBG9jRFhmNcca8oPi
ehodBtZBlHyBXnAnDkSEJY7nux3+jsj4N4X+LDjmv6yWH5EmqMvM3NysXalM6sBj
SZeZyoxJZxl+WE658ZPoVawg3NuABOdOuY6tw/FLvFtBH3rKQzvlz0nXp19m1kef
TrGjCsJsVO/pnrXlQMB0+JnQooRId/UGUTHfgqqP2hySn6a2KcGrxCd7tWxBQ5oW
a3DSmVo7QEwxvEPnIPpxPFv698Rkheb+gQZVrOBjmE4YmdmuaCFUDt04zDRwFjsz
dH+tX3kgQvHZEwVqWylikNqkFDWuako7+N1fZ9DxWZh8tnDe6FXXgQ+DL80nuATS
DGI7ojpAhe8YUyCqOHsCD3lW1hHUmtmzj/N2OrN+U2ROO745RCqs9Co+Kn2ugRhq
YOdgbYJL6aRzu/D52w==
=9t+O
-----END PGP SIGNATURE-----
Merge tag 'batadv-net-for-davem-20190821' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here is a batman-adv bugfix:
- fix uninit-value in batadv_netlink_get_ifindex(), by Eric Dumazet
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 96cce12ff6 ("cfg80211: fix processing world
regdomain when non modular").
Re-triggering a reg_process_hint with the last request on all events,
can make the regulatory domain fail in case of multiple WiFi modules. On
slower boards (espacially with mdev), enumeration of the WiFi modules
can end up in an intersected regulatory domain, and user cannot set it
with 'iw reg set' anymore.
This is happening, because:
- 1st module enumerates, queues up a regulatory request
- request gets processed by __reg_process_hint_driver():
- checks if previous was set by CORE -> yes
- checks if regulator domain changed -> yes, from '00' to e.g. 'US'
-> sends request to the 'crda'
- 2nd module enumerates, queues up a regulator request (which triggers
the reg_todo() work)
- reg_todo() -> reg_process_pending_hints() sees, that the last request
is not processed yet, so it tries to process it again.
__reg_process_hint driver() will run again, and:
- checks if the last request's initiator was the core -> no, it was
the driver (1st WiFi module)
- checks, if the previous initiator was the driver -> yes
- checks if the regulator domain changed -> yes, it was '00' (set by
core, and crda call did not return yet), and should be changed to 'US'
------> __reg_process_hint_driver calls an intersect
Besides, the reg_process_hint call with the last request is meaningless
since the crda call has a timeout work. If that timeout expires, the
first module's request will lost.
Cc: stable@vger.kernel.org
Fixes: 96cce12ff6 ("cfg80211: fix processing world regdomain when non modular")
Signed-off-by: Robert Hodaszi <robert.hodaszi@digi.com>
Link: https://lore.kernel.org/r/20190614131600.GA13897@a1-hr
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fix two shortcomings in the Extended Key ID API:
1) Allow the userspace to install pairwise keys using keyid 1 without
NL80211_KEY_NO_TX set. This allows the userspace to install and
activate pairwise keys with keyid 1 in the same way as for keyid 0,
simplifying the API usage for e.g. FILS and FT key installs.
2) IEEE 802.11 - 2016 restricts Extended Key ID usage to CCMP/GCMP
ciphers in IEEE 802.11 - 2016 "9.4.2.25.4 RSN capabilities".
Enforce that when installing a key.
Cc: stable@vger.kernel.org # 5.2
Fixes: 6cdd3979a2 ("nl80211/cfg80211: Extended Key ID support")
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
Link: https://lore.kernel.org/r/20190805123400.51567-1-alexander@wetzel-home.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
in ip_mc_inc_group, memory allocation flag, not mcast mode, is expected
by __ip_mc_inc_group
similar issue in __ip_mc_join_group, both mcase mode and gfp_t are needed
here, so use ____ip_mc_inc_group(...)
Fixes: 9fb20801da ("net: Fix ip_mc_{dec,inc}_group allocation context")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NCSI spec indicates that if the data does not end on a 32 bit
boundary, one to three padding bytes equal to 0x00 shall be present to
align the checksum field to a 32-bit boundary.
Signed-off-by: Terry S. Duncan <terry.s.duncan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we are only explicitly setting SOCK_NOSPACE on a write timeout
for non-blocking sockets. Epoll() edge-trigger mode relies on SOCK_NOSPACE
being set when -EAGAIN is returned to ensure that EPOLLOUT is raised.
Expand the setting of SOCK_NOSPACE to non-blocking sockets as well that can
use SO_SNDTIMEO to adjust their write timeout. This mirrors the behavior
that Eric Dumazet introduced for tcp sockets.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ursula Braun <ubraun@linux.ibm.com>
Cc: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix mem leak caused by missed unpin routine for umem pages.
Fixes: 8aef7340ae ("xsk: introduce xdp_umem_page")
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Remove IP MASQUERADING record in MAINTAINERS file,
from Denis Efremov.
2) Counter arguments are swapped in ebtables, from
Todd Seidelmann.
3) Missing netlink attribute validation in flow_offload
extension.
4) Incorrect alignment in xt_nfacct that breaks 32-bits
userspace / 64-bits kernels, from Juliana Rodrigueiro.
5) Missing include guard in nf_conntrack_h323_types.h,
from Masahiro Yamada.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As Jason Baron explained in commit 790ba4566c ("tcp: set SOCK_NOSPACE
under memory pressure"), it is crucial we properly set SOCK_NOSPACE
when needed.
However, Jason patch had a bug, because the 'nonblocking' status
as far as sk_stream_wait_memory() is concerned is governed
by MSG_DONTWAIT flag passed at sendmsg() time :
long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(),
and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE
cleared, if sk->sk_sndtimeo has been set to a small (but not zero)
value.
This patch removes the 'noblock' variable since we must always
set SOCK_NOSPACE if -EAGAIN is returned.
It also renames the do_nonblock label since we might reach this
code path even if we were in blocking mode.
Fixes: 790ba4566c ("tcp: set SOCK_NOSPACE under memory pressure")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Reported-by: Vladimir Rutsky <rutsky@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running a 64-bit kernel with a 32-bit iptables binary, the size of
the xt_nfacct_match_info struct diverges.
kernel: sizeof(struct xt_nfacct_match_info) : 40
iptables: sizeof(struct xt_nfacct_match_info)) : 36
Trying to append nfacct related rules results in an unhelpful message.
Although it is suggested to look for more information in dmesg, nothing
can be found there.
# iptables -A <chain> -m nfacct --nfacct-name <acct-object>
iptables: Invalid argument. Run `dmesg' for more information.
This patch fixes the memory misalignment by enforcing 8-byte alignment
within the struct's first revision. This solution is often used in many
other uapi netfilter headers.
Signed-off-by: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The netlink attribute policy for NFTA_FLOW_TABLE_NAME is missing.
Fixes: a3c90f7a23 ("netfilter: nf_tables: flow offload expression")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The ordering of arguments to the x_tables ADD_COUNTER macro
appears to be wrong in ebtables (cf. ip_tables.c, ip6_tables.c,
and arp_tables.c).
This causes data corruption in the ebtables userspace tools
because they get incorrect packet & byte counts from the kernel.
Fixes: d72133e628 ("netfilter: ebtables: use ADD_COUNTER macro")
Signed-off-by: Todd Seidelmann <tseidelmann@linode.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds initial support for offloading basechains using the
priority range from 1 to 65535. This is restricting the netfilter
priority range to 16-bit integer since this is what most drivers assume
so far from tc. It should be possible to extend this range of supported
priorities later on once drivers are updated to support for 32-bit
integer priorities.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth 2019-08-17
Here's a set of Bluetooth fixes for the 5.3-rc series:
- Multiple fixes for Qualcomm (btqca & hci_qca) drivers
- Minimum encryption key size debugfs setting (this is required for
Bluetooth Qualification)
- Fix hidp_send_message() to have a meaningful return value
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
For testing and qualification purposes it is useful to allow changing
the minimum encryption key size value that the host stack is going to
enforce. This adds a new debugfs setting min_encrypt_key_size to achieve
this functionality.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
This commit eliminates the use of the link 'stale_limit' & 'prev_from'
(besides the already removed - 'stale_cnt') variables in the detection
of repeated retransmit failures as there is no proper way to initialize
them to avoid a false detection, i.e. it is not really a retransmission
failure but due to a garbage values in the variables.
Instead, a jiffies variable will be added to individual skbs (like the
way we restrict the skb retransmissions) in order to mark the first skb
retransmit time. Later on, at the next retransmissions, the timestamp
will be checked to see if the skb in the link transmq is "too stale",
that is, the link tolerance time has passed, so that a link reset will
be ordered. Note, just checking on the first skb in the queue is fine
enough since it must be the oldest one.
A counter is also added to keep track the actual skb retransmissions'
number for later checking when the failure happens.
The downside of this approach is that the skb->cb[] buffer is about to
be exhausted, however it is always able to allocate another memory area
and keep a reference to it when needed.
Fixes: 77cf8edbc0 ("tipc: simplify stale link failure criteria")
Reported-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl1T5QEACgkQ+7dXa6fL
C2utGA//SeeXoU2E7ER2m66REbD/3+0/4Gwcy/NpNgou+GDLS9zo7ASbcQOmW1uN
MkzEPsL4NuYyJFxJits02nDrqmIpsBKyg+tZCR1CfmmkYrvSHZ5UE8AWr3Ogu9nR
3f6bJ2IhDmuSCQclThHJPaoSoPfLpvrqOMzvdngPzcvrIBX9YKQ91Pye8mEaZTbL
yDWJ8MVN0OJV0X/uUVkaGd/5k/FImjBNV6+kH766Hna0lrCUO4NgwBylFHtyi06I
ZG6zqDtnnZhytN5qkRgtYARqGRoAVubjoHUVywcLreenZ6XzXvo1+ch2aZ7ZFr0m
10oSpjsgn5d2LniHhESxTZBVR+USjuTD/ySLyz5aPA0zE9Qo9qoQPURdZ0xtz4lC
j15gjxXGZBw5c0W47IomGmH2tpUQzOVa8fUWVcZqBa2GS6ByxWzyVpOLd81EnUNQ
MyXOZm44EeM6AE39/xRTYbNw8CGCzbV5gyENosTC0oXFIQjHBFXmHOMZgSIDUZiQ
i09JiQGWafkuQfMM1TRaMICm83q4ObrCbuZTzf0c7vELAACCP10h6kF5KEJSUzuI
Mf7VkVPjwgYx083zdUHzW8k1ZJyB6uOg3NFInZz75mxkHj9kyCfTZPd5qgsAqRY7
MLLM/09PJsjNbHcstJLimJ0tAkHIHT6cx66QkYh4HhV3B7C5ACg=
=zu2P
-----END PGP SIGNATURE-----
Merge tag 'rxrpc-fixes-20190814' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Fix local endpoint handling
Here's a pair of patches that fix two issues in the handling of local
endpoints (rxrpc_local structs):
(1) Use list_replace_init() rather than list_replace() if we're going to
unconditionally delete the replaced item later, lest the list get
corrupted.
(2) Don't access the rxrpc_local object after passing our ref to the
workqueue, not even to illuminate tracepoints, as the work function
may cause the object to be freed. We have to cache the information
beforehand.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
This patchset contains Netfilter fixes for net:
1) Extend selftest to cover flowtable with ipsec, from Florian Westphal.
2) Fix interaction of ipsec with flowtable, also from Florian.
3) User-after-free with bound set to rule that fails to load.
4) Adjust state and timeout for flows that expire.
5) Timeout update race with flows in teardown state.
6) Ensure conntrack id hash calculation use invariants as input,
from Dirk Morris.
7) Do not push flows into flowtable for TCP fin/rst packets.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The ctx->sk_write_space pointer is only set when TLS tx mode is enabled.
When running without TX mode its a null pointer but we still set the
sk sk_write_space pointer on close().
Fix the close path to only overwrite sk->sk_write_space when the current
pointer is to the tls_write_space function indicating the tls module should
clean it up properly as well.
Reported-by: Hillf Danton <hdanton@sina.com>
Cc: Ying Xue <ying.xue@windriver.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Fixes: 57c722e932 ("net/tls: swap sk_write_space on close")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rxrpc_queue_local() attempts to queue the local endpoint it is given and
then, if successful, prints a trace line. The trace line includes the
current usage count - but we're not allowed to look at the local endpoint
at this point as we passed our ref on it to the workqueue.
Fix this by reading the usage count before queuing the work item.
Also fix the reading of local->debug_id for trace lines, which must be done
with the same consideration as reading the usage count.
Fixes: 09d2bf595d ("rxrpc: Add a tracepoint to track rxrpc_local refcounting")
Reported-by: syzbot+78e71c5bab4f76a6a719@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
When a local endpoint (struct rxrpc_local) ceases to be in use by any
AF_RXRPC sockets, it starts the process of being destroyed, but this
doesn't cause it to be removed from the namespace endpoint list immediately
as tearing it down isn't trivial and can't be done in softirq context, so
it gets deferred.
If a new socket comes along that wants to bind to the same endpoint, a new
rxrpc_local object will be allocated and rxrpc_lookup_local() will use
list_replace() to substitute the new one for the old.
Then, when the dying object gets to rxrpc_local_destroyer(), it is removed
unconditionally from whatever list it is on by calling list_del_init().
However, list_replace() doesn't reset the pointers in the replaced
list_head and so the list_del_init() will likely corrupt the local
endpoints list.
Fix this by using list_replace_init() instead.
Fixes: 730c5fd42c ("rxrpc: Fix local endpoint refcounting")
Reported-by: syzbot+193e29e9387ea5837f1d@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
TCP rst and fin packets do not qualify to place a flow into the
flowtable. Most likely there will be no more packets after connection
closure. Without this patch, this flow entry expires and connection
tracking picks up the entry in ESTABLISHED state using the fixup
timeout, which makes this look inconsistent to the user for a connection
that is actually already closed.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If the stream outq is not empty, need to kfree nstr_list.
Fixes: d570a59c5b ("sctp: only allow the out stream reset when the stream outq is empty")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
As the annotation says in sctp_do_8_2_transport_strike():
"If the transport error count is greater than the pf_retrans
threshold, and less than pathmaxrtx ..."
It should be transport->error_count checked with pathmaxrxt,
instead of asoc->pf_retrans.
Fixes: 5aa93bcf66 ("sctp: Implement quick failover draft from tsvwg")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Variable rc is initialized to a value that is never read and it is
re-assigned later. The initialization is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Change ct id hash calculation to only use invariants.
Currently the ct id hash calculation is based on some fields that can
change in the lifetime on a conntrack entry in some corner cases. The
current hash uses the whole tuple which contains an hlist pointer which
will change when the conntrack is placed on the dying list resulting in
a ct id change.
This patch also removes the reply-side tuple and extension pointer from
the hash calculation so that the ct id will will not change from
initialization until confirmation.
Fixes: 3c79107631 ("netfilter: ctnetlink: don't use conntrack/expect object addresses as id")
Signed-off-by: Dirk Morris <dmorris@metaloft.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Let hidp_send_message return the number of successfully queued bytes
instead of an unconditional 0.
With the return value fixed to 0, other drivers relying on hidp, such as
hidraw, can not return meaningful values from their respective
implementations of write(). In particular, with the current behavior, a
hidraw device's write() will have different return values depending on
whether the device is connected via USB or Bluetooth, which makes it
harder to abstract away the transport layer.
Signed-off-by: Fabian Henneke <fabian.henneke@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We set the field 'addr_trial_end' to 'jiffies', instead of the current
value 0, at the moment the node address is initialized. This guarantees
we don't inadvertently enter an address trial period when the node
address is explicitly set by the user.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix rxrpc_unuse_local() to handle a NULL local pointer as it can be called
on an unbound socket on which rx->local is not yet set.
The following reproduced (includes omitted):
int main(void)
{
socket(AF_RXRPC, SOCK_DGRAM, AF_INET);
return 0;
}
causes the following oops to occur:
BUG: kernel NULL pointer dereference, address: 0000000000000010
...
RIP: 0010:rxrpc_unuse_local+0x8/0x1b
...
Call Trace:
rxrpc_release+0x2b5/0x338
__sock_release+0x37/0xa1
sock_close+0x14/0x17
__fput+0x115/0x1e9
task_work_run+0x72/0x98
do_exit+0x51b/0xa7a
? __context_tracking_exit+0x4e/0x10e
do_group_exit+0xab/0xab
__x64_sys_exit_group+0x14/0x17
do_syscall_64+0x89/0x1d4
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Reported-by: syzbot+20dee719a2e090427b5f@syzkaller.appspotmail.com
Fixes: 730c5fd42c ("rxrpc: Fix local endpoint refcounting")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we swap the original proto and clear the ULP pointer
on close we have to make sure no callback will try to access
the freed state. sk_write_space is not part of sk_prot, remember
to swap it.
Reported-by: syzbot+dcdc9deefaec44785f32@syzkaller.appspotmail.com
Fixes: 95fa145479 ("bpf: sockmap/tls, close can race with map free")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Generating and retrieving socket cookies are a useful feature that is
exposed to BPF for various program types through bpf_get_socket_cookie()
helper.
The fact that the cookie counter is per netns is quite a limitation
for BPF in practice in particular for programs in host namespace that
use socket cookies as part of a map lookup key since they will be
causing socket cookie collisions e.g. when attached to BPF cgroup hooks
or cls_bpf on tc egress in host namespace handling container traffic
from veth or ipvlan devices with peer in different netns. Change the
counter to be global instead.
Socket cookie consumers must assume the value as opqaue in any case.
Not every socket must have a cookie generated and knowledge of the
counter value itself does not provide much value either way hence
conversion to global is fine.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Martynas Pumputis <m@lambda.lt>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't bother generating maxSkew in the ACK packet as it has been obsolete
since AFS 3.1.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
The object lifetime management on the rxrpc_local struct is broken in that
the rxrpc_local_processor() function is expected to clean up and remove an
object - but it may get requeued by packets coming in on the backing UDP
socket once it starts running.
This may result in the assertion in rxrpc_local_rcu() firing because the
memory has been scheduled for RCU destruction whilst still queued:
rxrpc: Assertion failed
------------[ cut here ]------------
kernel BUG at net/rxrpc/local_object.c:468!
Note that if the processor comes around before the RCU free function, it
will just do nothing because ->dead is true.
Fix this by adding a separate refcount to count active users of the
endpoint that causes the endpoint to be destroyed when it reaches 0.
The original refcount can then be used to refcount objects through the work
processor and cause the memory to be rcu freed when that reaches 0.
Fixes: 4f95dd78a7 ("rxrpc: Rework local endpoint management")
Reported-by: syzbot+1e0edc4b8b7494c28450@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Flows that are in teardown state (due to RST / FIN TCP packet) still
have their offload flag set on. Hence, the conntrack garbage collector
may race to undo the timeout adjustment that the fixup routine performs,
leaving the conntrack entry in place with the internal offload timeout
(one day).
Update teardown flow state to ESTABLISHED and set tracking to liberal,
then once the offload bit is cleared, adjust timeout if it is more than
the default fixup timeout (conntrack might already have set a lower
timeout from the packet path).
Fixes: da5984e510 ("netfilter: nf_flow_table: add support for sending flows back to the slow path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Update conntrack entry to pick up expired flows, otherwise the conntrack
entry gets stuck with the internal offload timeout (one day). The TCP
state also needs to be adjusted to ESTABLISHED state and tracking is set
to liberal mode in order to give conntrack a chance to pick up the
expired flow.
Fixes: ac2a66665e ("netfilter: add generic flow table infrastructure")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If a rule that has already a bound anonymous set fails to be added, the
preparation phase releases the rule and the bound set. However, the
transaction object from the abort path still has a reference to the set
object that is stale, leading to a use-after-free when checking for the
set->bound field. Add a new field to the transaction that specifies if
the set is bound, so the abort path can skip releasing it since the rule
command owns it and it takes care of releasing it. After this update,
the set->bound field is removed.
[ 24.649883] Unable to handle kernel paging request at virtual address 0000000000040434
[ 24.657858] Mem abort info:
[ 24.660686] ESR = 0x96000004
[ 24.663769] Exception class = DABT (current EL), IL = 32 bits
[ 24.669725] SET = 0, FnV = 0
[ 24.672804] EA = 0, S1PTW = 0
[ 24.675975] Data abort info:
[ 24.678880] ISV = 0, ISS = 0x00000004
[ 24.682743] CM = 0, WnR = 0
[ 24.685723] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000428952000
[ 24.692207] [0000000000040434] pgd=0000000000000000
[ 24.697119] Internal error: Oops: 96000004 [#1] SMP
[...]
[ 24.889414] Call trace:
[ 24.891870] __nf_tables_abort+0x3f0/0x7a0
[ 24.895984] nf_tables_abort+0x20/0x40
[ 24.899750] nfnetlink_rcv_batch+0x17c/0x588
[ 24.904037] nfnetlink_rcv+0x13c/0x190
[ 24.907803] netlink_unicast+0x18c/0x208
[ 24.911742] netlink_sendmsg+0x1b0/0x350
[ 24.915682] sock_sendmsg+0x4c/0x68
[ 24.919185] ___sys_sendmsg+0x288/0x2c8
[ 24.923037] __sys_sendmsg+0x7c/0xd0
[ 24.926628] __arm64_sys_sendmsg+0x2c/0x38
[ 24.930744] el0_svc_common.constprop.0+0x94/0x158
[ 24.935556] el0_svc_handler+0x34/0x90
[ 24.939322] el0_svc+0x8/0xc
[ 24.942216] Code: 37280300 f9404023 91014262 aa1703e0 (f9401863)
[ 24.948336] ---[ end trace cebbb9dcbed3b56f ]---
Fixes: f6ac858589 ("netfilter: nf_tables: unbind set in rule from commit path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
sk_validate_xmit_skb() and drivers depend on the sk member of
struct sk_buff to identify segments requiring encryption.
Any operation which removes or does not preserve the original TLS
socket such as skb_orphan() or skb_clone() will cause clear text
leaks.
Make the TCP socket underlying an offloaded TLS connection
mark all skbs as decrypted, if TLS TX is in offload mode.
Then in sk_validate_xmit_skb() catch skbs which have no socket
(or a socket with no validation) and decrypted flag set.
Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and
sk->sk_validate_xmit_skb are slightly interchangeable right now,
they all imply TLS offload. The new checks are guarded by
CONFIG_TLS_DEVICE because that's the option guarding the
sk_buff->decrypted member.
Second, smaller issue with orphaning is that it breaks
the guarantee that packets will be delivered to device
queues in-order. All TLS offload drivers depend on that
scheduling property. This means skb_orphan_partial()'s
trick of preserving partial socket references will cause
issues in the drivers. We need a full orphan, and as a
result netem delay/throttling will cause all TLS offload
skbs to be dropped.
Reusing the sk_buff->decrypted flag also protects from
leaking clear text when incoming, decrypted skb is redirected
(e.g. by TC).
See commit 0608c69c9a ("bpf: sk_msg, sock{map|hash} redirect
through ULP") for justification why the internal flag is safe.
The only location which could leak the flag in is tcp_bpf_sendmsg(),
which is taken care of by clearing the previously unused bit.
v2:
- remove superfluous decrypted mark copy (Willem);
- remove the stale doc entry (Boris);
- rely entirely on EOR marking to prevent coalescing (Boris);
- use an internal sendpages flag instead of marking the socket
(Boris).
v3 (Willem):
- reorganize the can_skb_orphan_partial() condition;
- fix the flag leak-in through tcp_bpf_sendmsg.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add get_fill_size() routine used to calculate the action size
when building a batch of events.
Fixes: ca9b0e27e ("pkt_action: add new action skbedit")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>