linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-11-24 14:10:49 +07:00

Author	SHA1	Message	Date
Ridge Kennedy	0d0d9a388a	l2tp: Allow duplicate session creation with UDP In the past it was possible to create multiple L2TPv3 sessions with the same session id as long as the sessions belonged to different tunnels. The resulting sessions had issues when used with IP encapsulated tunnels, but worked fine with UDP encapsulated ones. Some applications began to rely on this behaviour to avoid having to negotiate unique session ids. Some time ago a change was made to require session ids to be unique across all tunnels, breaking the applications making use of this "feature". This change relaxes the duplicate session id check to allow duplicates if both of the colliding sessions belong to UDP encapsulated tunnels. Fixes: `dbdbc73b44` ("l2tp: fix duplicate session creation") Signed-off-by: Ridge Kennedy <ridge.kennedy@alliedtelesis.co.nz> Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-04 12:35:49 +01:00
Cong Wang	599be01ee5	net_sched: fix an OOB access in cls_tcindex As Eric noticed, tcindex_alloc_perfect_hash() uses cp->hash to compute the size of memory allocation, but cp->hash is set again after the allocation, this caused an out-of-bound access. So we have to move all cp->hash initialization and computation before the memory allocation. Move cp->mask and cp->shift together as cp->hash may need them for computation too. Reported-and-tested-by: syzbot+35d4dea36c387813ed31@syzkaller.appspotmail.com Fixes: `331b72922c` ("net: sched: RCU cls_tcindex") Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-04 11:41:36 +01:00
Eric Dumazet	2b5b8251bc	net: hsr: fix possible NULL deref in hsr_handle_frame() hsr_port_get_rcu() can return NULL, so we need to be careful. general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] CPU: 1 PID: 10249 Comm: syz-executor.5 Not tainted 5.5.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__read_once_size include/linux/compiler.h:199 [inline] RIP: 0010:hsr_addr_is_self+0x86/0x330 net/hsr/hsr_framereg.c:44 Code: 04 00 f3 f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 e8 6b ff 94 f9 4c 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 75 02 00 00 48 8b 43 30 49 39 c6 49 89 47 c0 0f RSP: 0018:ffffc90000da8a90 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff87e0cc33 RDX: 0000000000000006 RSI: ffffffff87e035d5 RDI: 0000000000000000 RBP: ffffc90000da8b20 R08: ffff88808e7de040 R09: ffffed1015d2707c R10: ffffed1015d2707b R11: ffff8880ae9383db R12: ffff8880a689bc5e R13: 1ffff920001b5153 R14: 0000000000000030 R15: ffffc90000da8af8 FS: 00007fd7a42be700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32338000 CR3: 00000000a928c000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> hsr_handle_frame+0x1c5/0x630 net/hsr/hsr_slave.c:31 __netif_receive_skb_core+0xfbc/0x30b0 net/core/dev.c:5099 __netif_receive_skb_one_core+0xa8/0x1a0 net/core/dev.c:5196 __netif_receive_skb+0x2c/0x1d0 net/core/dev.c:5312 process_backlog+0x206/0x750 net/core/dev.c:6144 napi_poll net/core/dev.c:6582 [inline] net_rx_action+0x508/0x1120 net/core/dev.c:6650 __do_softirq+0x262/0x98c kernel/softirq.c:292 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082 </IRQ> Fixes: `c5a7591172` ("net/hsr: Use list_head (and rcu) instead of array for slave devices.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-02-04 09:27:07 +01:00
Jakub Kicinski	3d80c653f9	RxRPC fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl439W0ACgkQ+7dXa6fL C2uT5RAAl+rytJS768Aa6c/S9TPBaNOKzFQmgNaWzAtgC5BAtfZf1Bl0okFOba2V QtcFmTogRl7BEnf9S3w3z25qj7Nv5udjIaAbH7cO6j1vuIwRpvtEZ7WXlE3L4hxA DscSysqI9L5ISsnzfloUvA/biA9azsH2ckgMxFo++YmLHvagXW0lmkE4yZw+aZ/T DstFAMAnFQwEAyu+Et3bo32/382aJh+gBGgV/2wuHBPgzQMhiWdosKdNnEZtZ3V2 HvgVuAU2V4hOf8OuaPuBnZrJPondTq5e1W+5mQiYLYzTCTOs6rZ1gUGfEXth3d1H ABJ0FoyPgN783msb0yL6OOLMWiTc9USiLB8a3U2vJOD+hQTkPSYSGsCObeIGVW/Y dbBd5fwudF78LIP9TjYWbau3vHtV73N4Mc1UT219jew0+Hi1ik5VIj8q0wdBIkzm vKcVznXnPReOUBVKmqA2scnXs8EA4w6bTCjyd3JYUBK2qLchz366s+0oYyIK8Nq0 dy8TVISaCSfohI4nAYAv90AUIDa+zdHKuttBsAD90yVM/wrnokqc/RonfWjHs32k SerM0RclPidool3Hkc/w0uvPywDQg/S5sKAW6E6lpfTppOwIY0wq/yZc5N2d9FZV UrPZVGhQszC7AM/CBKXDT0ZMrPnAfGO8PMPLyFrAcv2p9O7XuWA= =kHIG -----END PGP SIGNATURE----- Merge tag 'rxrpc-fixes-20200203' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== RxRPC fixes Here are a number of fixes for AF_RXRPC: (1) Fix a potential use after free in rxrpc_put_local() where it was accessing the object just put to get tracing information. (2) Fix insufficient notifications being generated by the function that queues data packets on a call. This occasionally causes recvmsg() to stall indefinitely. (3) Fix a number of packet-transmitting work functions to hold an active count on the local endpoint so that the UDP socket doesn't get destroyed whilst they're calling kernel_sendmsg() on it. (4) Fix a NULL pointer deref that stemmed from a call's connection pointer being cleared when the call was disconnected. Changes: v2: Removed a couple of BUG() statements that got added. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-02-03 10:26:23 -08:00
David Howells	5273a191dc	rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect When a call is disconnected, the connection pointer from the call is cleared to make sure it isn't used again and to prevent further attempted transmission for the call. Unfortunately, there might be a daemon trying to use it at the same time to transmit a packet. Fix this by keeping call->conn set, but setting a flag on the call to indicate disconnection instead. Remove also the bits in the transmission functions where the conn pointer is checked and a ref taken under spinlock as this is now redundant. Fixes: `8d94aa381d` ("rxrpc: Calls shouldn't hold socket refs") Signed-off-by: David Howells <dhowells@redhat.com>	2020-02-03 10:25:30 +00:00
SeongJae Park	9603d47bad	tcp: Reduce SYN resend delay if a suspicous ACK is received When closing a connection, the two acks that required to change closing socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in reverse order. This is possible in RSS disabled environments such as a connection inside a host. For example, expected state transitions and required packets for the disconnection will be similar to below flow. 00 (Process A) (Process B) 01 ESTABLISHED ESTABLISHED 02 close() 03 FIN_WAIT_1 04 ---FIN--> 05 CLOSE_WAIT 06 <--ACK--- 07 FIN_WAIT_2 08 <--FIN/ACK--- 09 TIME_WAIT 10 ---ACK--> 11 LAST_ACK 12 CLOSED CLOSED In some cases such as LINGER option applied socket, the FIN and FIN/ACK will be substituted to RST and RST/ACK, but there is no difference in the main logic. The acks in lines 6 and 8 are the acks. If the line 8 packet is processed before the line 6 packet, it will be just ignored as it is not a expected packet, and the later process of the line 6 packet will change the status of Process A to FIN_WAIT_2, but as it has already handled line 8 packet, it will not go to TIME_WAIT and thus will not send the line 10 packet to Process B. Thus, Process B will left in CLOSE_WAIT status, as below. 00 (Process A) (Process B) 01 ESTABLISHED ESTABLISHED 02 close() 03 FIN_WAIT_1 04 ---FIN--> 05 CLOSE_WAIT 06 (<--ACK---) 07 (<--FIN/ACK---) 08 (fired in right order) 09 <--FIN/ACK--- 10 <--ACK--- 11 (processed in reverse order) 12 FIN_WAIT_2 Later, if the Process B sends SYN to Process A for reconnection using the same port, Process A will responds with an ACK for the last flow, which has no increased sequence number. Thus, Process A will send RST, wait for TIMEOUT_INIT (one second in default), and then try reconnection. If reconnections are frequent, the one second latency spikes can be a big problem. Below is a tcpdump results of the problem: 14.436259 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644 14.436266 IP 127.0.0.1.4242 > 127.0.0.1.45150: Flags [.], ack 5, win 512 14.436271 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [R], seq 2541101298 /* ONE SECOND DELAY */ 15.464613 IP 127.0.0.1.45150 > 127.0.0.1.4242: Flags [S], seq 2560603644 This commit mitigates the problem by reducing the delay for the next SYN if the suspicous ACK is received while in SYN_SENT state. Following commit will add a selftest, which can be also helpful for understanding of this issue. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-02-02 13:33:21 -08:00
Jakub Kicinski	b7c3a17c60	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix suspicious RCU usage in ipset, from Jozsef Kadlecsik. 2) Use kvcalloc, from Joe Perches. 3) Flush flowtable hardware workqueue after garbage collection run, from Paul Blakey. 4) Missing flowtable hardware workqueue flush from nf_flow_table_free(), also from Paul. 5) Restore NF_FLOW_HW_DEAD in flow_offload_work_del(), from Paul. 6) Flowtable documentation fixes, from Matteo Croce. ==================== Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-02-01 12:38:20 -08:00
Eric Dumazet	cb3c0e6bdf	cls_rsvp: fix rsvp_policy NLA_BINARY can be confusing, since .len value represents the max size of the blob. cls_rsvp really wants user space to provide long enough data for TCA_RSVP_DST and TCA_RSVP_SRC attributes. BUG: KMSAN: uninit-value in rsvp_get net/sched/cls_rsvp.h:258 [inline] BUG: KMSAN: uninit-value in gen_handle net/sched/cls_rsvp.h:402 [inline] BUG: KMSAN: uninit-value in rsvp_change+0x1ae9/0x4220 net/sched/cls_rsvp.h:572 CPU: 1 PID: 13228 Comm: syz-executor.1 Not tainted 5.5.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 rsvp_get net/sched/cls_rsvp.h:258 [inline] gen_handle net/sched/cls_rsvp.h:402 [inline] rsvp_change+0x1ae9/0x4220 net/sched/cls_rsvp.h:572 tc_new_tfilter+0x31fe/0x5010 net/sched/cls_api.c:2104 rtnetlink_rcv_msg+0xcb7/0x1570 net/core/rtnetlink.c:5415 netlink_rcv_skb+0x451/0x650 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5442 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xf9e/0x1100 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x1248/0x14d0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330 ___sys_sendmsg net/socket.c:2384 [inline] __sys_sendmsg+0x451/0x5f0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45b349 Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f269d43dc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f269d43e6d4 RCX: 000000000045b349 RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000003 RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000009c2 R14: 00000000004cb338 R15: 000000000075bfd4 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline] kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127 kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:82 slab_alloc_node mm/slub.c:2774 [inline] __kmalloc_node_track_caller+0xb40/0x1200 mm/slub.c:4382 __kmalloc_reserve net/core/skbuff.c:141 [inline] __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:209 alloc_skb include/linux/skbuff.h:1049 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1174 [inline] netlink_sendmsg+0x7d3/0x14d0 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2330 ___sys_sendmsg net/socket.c:2384 [inline] __sys_sendmsg+0x451/0x5f0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: `6fa8c0144b` ("[NET_SCHED]: Use nla_policy for attribute validation in classifiers") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-02-01 12:25:06 -08:00
Eric Dumazet	784f8344de	tcp: clear tp->segs_{in\|out} in tcp_disconnect() tp->segs_in and tp->segs_out need to be cleared in tcp_disconnect(). tcp_disconnect() is rarely used, but it is worth fixing it. Fixes: `2efd055c53` ("tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Marcelo Ricardo Leitner <mleitner@redhat.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-31 22:12:37 -08:00
Eric Dumazet	db7ffee6f3	tcp: clear tp->data_segs{in\|out} in tcp_disconnect() tp->data_segs_in and tp->data_segs_out need to be cleared in tcp_disconnect(). tcp_disconnect() is rarely used, but it is worth fixing it. Fixes: `a44d6eacda` ("tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-31 22:12:22 -08:00
Eric Dumazet	2fbdd56251	tcp: clear tp->delivered in tcp_disconnect() tp->delivered needs to be cleared in tcp_disconnect(). tcp_disconnect() is rarely used, but it is worth fixing it. Fixes: `ddf1af6fa0` ("tcp: new delivery accounting") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-31 22:12:18 -08:00
Eric Dumazet	c13c48c00a	tcp: clear tp->total_retrans in tcp_disconnect() total_retrans needs to be cleared in tcp_disconnect(). tcp_disconnect() is rarely used, but it is worth fixing it. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: SeongJae Park <sjpark@amazon.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-01-31 22:11:35 -08:00
Paul Blakey	c22208b7ce	netfilter: flowtable: Fix setting forgotten NF_FLOW_HW_DEAD flag During the refactor this was accidently removed. Fixes: `ae29045018` ("netfilter: flowtable: add nf_flow_offload_tuple() helper") Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-01-31 19:31:42 +01:00
Paul Blakey	0f34f30a1b	netfilter: flowtable: Fix missing flush hardware on table free If entries exist when freeing a hardware offload enabled table, we queue work for hardware while running the gc iteration. Execute it (flush) after queueing. Fixes: `c29f74e0df` ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-01-31 19:31:41 +01:00
Paul Blakey	91bfaa15a3	netfilter: flowtable: Fix hardware flush order on nf_flow_table_cleanup On netdev down event, nf_flow_table_cleanup() is called for the relevant device and it cleans all the tables that are on that device. If one of those tables has hardware offload flag, nf_flow_table_iterate_cleanup flushes hardware and then runs the gc. But the gc can queue more hardware work, which will take time to execute. Instead first add the work, then flush it, to execute it now. Fixes: `c29f74e0df` ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-01-31 19:31:40 +01:00
Joe Perches	b9e0102a57	netfilter: Use kvcalloc Convert the uses of kvmalloc_array with __GFP_ZERO to the equivalent kvcalloc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-01-31 19:30:54 +01:00
David Howells	04d36d748f	rxrpc: Fix missing active use pinning of rxrpc_local object The introduction of a split between the reference count on rxrpc_local objects and the usage count didn't quite go far enough. A number of kernel work items need to make use of the socket to perform transmission. These also need to get an active count on the local object to prevent the socket from being closed. Fix this by getting the active count in those places. Also split out the raw active count get/put functions as these places tend to hold refs on the rxrpc_local object already, so getting and putting an extra object ref is just a waste of time. The problem can lead to symptoms like: BUG: kernel NULL pointer dereference, address: 0000000000000018 .. CPU: 2 PID: 818 Comm: kworker/u9:0 Not tainted 5.5.0-fscache+ #51 ... RIP: 0010:selinux_socket_sendmsg+0x5/0x13 ... Call Trace: security_socket_sendmsg+0x2c/0x3e sock_sendmsg+0x1a/0x46 rxrpc_send_keepalive+0x131/0x1ae rxrpc_peer_keepalive_worker+0x219/0x34b process_one_work+0x18e/0x271 worker_thread+0x1a3/0x247 kthread+0xe6/0xeb ret_from_fork+0x1f/0x30 Fixes: `730c5fd42c` ("rxrpc: Fix local endpoint refcounting") Signed-off-by: David Howells <dhowells@redhat.com>	2020-01-30 21:50:41 +00:00
David Howells	f71dbf2fb2	rxrpc: Fix insufficient receive notification generation In rxrpc_input_data(), rxrpc_notify_socket() is called if the base sequence number of the packet is immediately following the hard-ack point at the end of the function. However, this isn't sufficient, since the recvmsg side may have been advancing the window and then overrun the position in which we're adding - at which point rx_hard_ack >= seq0 and no notification is generated. Fix this by always generating a notification at the end of the input function. Without this, a long call may stall, possibly indefinitely. Fixes: `248f219cb8` ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com>	2020-01-30 21:50:41 +00:00
David Howells	fac20b9e73	rxrpc: Fix use-after-free in rxrpc_put_local() Fix rxrpc_put_local() to not access local->debug_id after calling atomic_dec_return() as, unless that returned n==0, we no longer have the right to access the object. Fixes: `06d9532fa6` ("rxrpc: Fix read-after-free in rxrpc_queue_local()") Signed-off-by: David Howells <dhowells@redhat.com>	2020-01-30 21:50:41 +00:00
Linus Torvalds	1c715a659a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) Various mptcp fixupes from Florian Westphal and Geery Uytterhoeven. 2) Don't clear the node/port GUIDs after we've assigned the correct values to them. From Leon Romanovsky. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: net/core: Do not clear VF index for node/port GUIDs query mptcp: Fix undefined mptcp_handle_ipv6_mapped for modular IPV6 net: drop_monitor: Use kstrdup udp: document udp_rcv_segment special case for looped packets mptcp: MPTCP_HMAC_TEST should depend on MPTCP mptcp: Fix incorrect IPV6 dependency check Revert "MAINTAINERS: mptcp@ mailing list is moderated" mptcp: handle tcp fallback when using syn cookies mptcp: avoid a lockdep splat when mcast group was joined mptcp: fix panic on user pointer access mptcp: defer freeing of cached ext until last moment net: mvneta: fix XDP support if sw bm is used as fallback sch_choke: Use kvcalloc mptcp: Fix build with PROC_FS disabled. MAINTAINERS: mptcp@ mailing list is moderated	2020-01-30 07:42:00 -08:00
Leon Romanovsky	9fbf082f56	net/core: Do not clear VF index for node/port GUIDs query VF numbers were assigned to node_guid and port_guid, but cleared right before such query calls were issued. It caused to return node/port GUIDs of VF index 0 for all VFs. Fixes: `30aad41721` ("net/core: Add support for getting VF GUIDs") Reported-by: Adrian Chiris <adrianc@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-30 15:20:26 +01:00
Geert Uytterhoeven	31484d56ca	mptcp: Fix undefined mptcp_handle_ipv6_mapped for modular IPV6 If CONFIG_MPTCP=y, CONFIG_MPTCP_IPV6=n, and CONFIG_IPV6=m: ERROR: "mptcp_handle_ipv6_mapped" [net/ipv6/ipv6.ko] undefined! This does not happen if CONFIG_MPTCP_IPV6=y, as CONFIG_MPTCP_IPV6 selects CONFIG_IPV6, and thus forces CONFIG_IPV6 builtin. As exporting a symbol for an empty function would be a bit wasteful, fix this by providing a dummy version of mptcp_handle_ipv6_mapped() for the CONFIG_MPTCP_IPV6=n case. Rename mptcp_handle_ipv6_mapped() to mptcpv6_handle_mapped(), to make it clear this is a pure-IPV6 function, just like mptcpv6_init(). Fixes: `cec37a6e41` ("mptcp: Handle MP_CAPABLE options for outgoing connections") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-30 10:55:54 +01:00
Joe Perches	72d62c4e42	net: drop_monitor: Use kstrdup Convert the equivalent but rather odd uses of kmemdup with __GFP_ZERO to the more common kstrdup and avoid unnecessary zeroing of copied over memory. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-30 10:03:37 +01:00
Geert Uytterhoeven	389b8fb3c4	mptcp: MPTCP_HMAC_TEST should depend on MPTCP As the MPTCP HMAC test is integrated into the MPTCP code, it can be built only when MPTCP is enabled. Hence when MPTCP is disabled, asking the user if the test code should be enabled is futile. Wrap the whole block of MPTCP-specific config options inside a check for MPTCP. While at it, drop the "default n" for MPTCP_HMAC_TEST, as that is the default anyway. Fixes: `65492c5a6a` ("mptcp: move from sha1 (v0) to sha256 (v1)") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-30 09:55:37 +01:00
Geert Uytterhoeven	8e1974a2a0	mptcp: Fix incorrect IPV6 dependency check If CONFIG_MPTCP=y, CONFIG_MPTCP_IPV6=n, and CONFIG_IPV6=m: net/mptcp/protocol.o: In function `__mptcp_tcp_fallback': protocol.c:(.text+0x786): undefined reference to `inet6_stream_ops' Fix this by checking for CONFIG_MPTCP_IPV6 instead of CONFIG_IPV6, like is done in all other places in the mptcp code. Fixes: `8ab183deb2` ("mptcp: cope with later TCP fallback") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-30 09:55:37 +01:00
Linus Torvalds	22b17db4ea	y2038: core, driver and file system changes These are updates to device drivers and file systems that for some reason or another were not included in the kernel in the previous y2038 series. I've gone through all users of time_t again to make sure the kernel is in a long-term maintainable state, replacing all remaining references to time_t with safe alternatives. Some related parts of the series were picked up into the nfsd, xfs, alsa and v4l2 trees. A final set of patches in linux-mm removes the now unused time_t/timeval/timespec types and helper functions after all five branches are merged for linux-5.6, ensuring that no new users get merged. As a result, linux-5.6, or my backport of the patches to 5.4 [1], should be the first release that can serve as a base for a 32-bit system designed to run beyond year 2038, with a few remaining caveats: - All user space must be compiled with a 64-bit time_t, which will be supported in the coming musl-1.2 and glibc-2.32 releases, along with installed kernel headers from linux-5.6 or higher. - Applications that use the system call interfaces directly need to be ported to use the time64 syscalls added in linux-5.1 in place of the existing system calls. This impacts most users of futex() and seccomp() as well as programming languages that have their own runtime environment not based on libc. - Applications that use a private copy of kernel uapi header files or their contents may need to update to the linux-5.6 version, in particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h, linux/elfcore.h, linux/sockios.h, linux/timex.h and linux/can/bcm.h. - A few remaining interfaces cannot be changed to pass a 64-bit time_t in a compatible way, so they must be configured to use CLOCK_MONOTONIC times or (with a y2106 problem) unsigned 32-bit timestamps. Most importantly this impacts all users of 'struct input_event'. - All y2038 problems that are present on 64-bit machines also apply to 32-bit machines. In particular this affects file systems with on-disk timestamps using signed 32-bit seconds: ext4 with ext3-style small inodes, ext2, xfs (to be fixed soon) and ufs. Changes since v1 [2]: - Add Acks I received - Rebase to v5.5-rc1, dropping patches that got merged already - Add NFS, XFS and the final three patches from another series - Rewrite etnaviv patches - Add one late revert to avoid an etnaviv regression [1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame [2] https://lore.kernel.org/lkml/20191108213257.3097633-1-arnd@arndb.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJeMYy3AAoJEGCrR//JCVInEGwP/0R+S+ok7vw9OdLVT0lFl07D IcVabgOWf24imN7m7L7Mlt3nDfxIT4tMpiAXq7eMO3spcyViG18O2LXdSQ4/7QBp +BlhoMjOP9w34Jyd7mnkFr4vqQALvfIqkS8rFObDtDub2Rfj9PC36MRMIu8BPXlv RK8bigwJeH/DV38yc5/JeUcD+WuewYLsK9XPWN+4yB4vgGsNU3ZQQ6nnzbR3hMsN DN8WZ68Y7IBs0Kyxkf+s2zmRXtCa2RiFg/2TUsk5olVAJVaenvte69hq5RSbg1vW vLi6K8cBoPWL59nqCzcNE+TUhSUg3LOj/a/KWyl76yovz7AlJaNjssOf8ZjHw6sL MhQqz3hXTxiJDS2Jvbf1yojiYGlzrq/gqcRFGe9jPcZdieMc4/yZCx60G/Exa5Pu YdMcqMyDWPFyUAFQNWEF59HPheOdj6tb1KpJ6bwgCo3P7QqhLrU4z9w3Py4/ZfBO 4sWcWteSsD6MN/ADJ2WQ56nNxzM2AvkeVJKcF6FCkdngXX9T0GExmZz7SqB5Du99 9lNjIiD5E+LBa/Swo/7n49aYa8x06V1pmHYTZVh9Wkl+CZiO21umezQFrWsfaMTp xt3c6pFdMG5xNMGpreTAXOmf2R+T6O8IO2qQq/TYjzqOLH7QC830P7avkmml+cK1 LjOBE2TfSeO8Ru1dXV4t =wx0A -----END PGP SIGNATURE----- Merge tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull y2038 updates from Arnd Bergmann: "Core, driver and file system changes These are updates to device drivers and file systems that for some reason or another were not included in the kernel in the previous y2038 series. I've gone through all users of time_t again to make sure the kernel is in a long-term maintainable state, replacing all remaining references to time_t with safe alternatives. Some related parts of the series were picked up into the nfsd, xfs, alsa and v4l2 trees. A final set of patches in linux-mm removes the now unused time_t/timeval/timespec types and helper functions after all five branches are merged for linux-5.6, ensuring that no new users get merged. As a result, linux-5.6, or my backport of the patches to 5.4 [1], should be the first release that can serve as a base for a 32-bit system designed to run beyond year 2038, with a few remaining caveats: - All user space must be compiled with a 64-bit time_t, which will be supported in the coming musl-1.2 and glibc-2.32 releases, along with installed kernel headers from linux-5.6 or higher. - Applications that use the system call interfaces directly need to be ported to use the time64 syscalls added in linux-5.1 in place of the existing system calls. This impacts most users of futex() and seccomp() as well as programming languages that have their own runtime environment not based on libc. - Applications that use a private copy of kernel uapi header files or their contents may need to update to the linux-5.6 version, in particular for sound/asound.h, xfs/xfs_fs.h, linux/input.h, linux/elfcore.h, linux/sockios.h, linux/timex.h and linux/can/bcm.h. - A few remaining interfaces cannot be changed to pass a 64-bit time_t in a compatible way, so they must be configured to use CLOCK_MONOTONIC times or (with a y2106 problem) unsigned 32-bit timestamps. Most importantly this impacts all users of 'struct input_event'. - All y2038 problems that are present on 64-bit machines also apply to 32-bit machines. In particular this affects file systems with on-disk timestamps using signed 32-bit seconds: ext4 with ext3-style small inodes, ext2, xfs (to be fixed soon) and ufs" [1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=y2038-endgame * tag 'y2038-drivers-for-v5.6-signed' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (21 commits) Revert "drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC" y2038: sh: remove timeval/timespec usage from headers y2038: sparc: remove use of struct timex y2038: rename itimerval to __kernel_old_itimerval y2038: remove obsolete jiffies conversion functions nfs: fscache: use timespec64 in inode auxdata nfs: fix timstamp debug prints nfs: use time64_t internally sunrpc: convert to time64_t for expiry drm/etnaviv: avoid deprecated timespec drm/etnaviv: reject timeouts with tv_nsec >= NSEC_PER_SEC drm/msm: avoid using 'timespec' hfs/hfsplus: use 64-bit inode timestamps hostfs: pass 64-bit timestamps to/from user space packet: clarify timestamp overflow tsacct: add 64-bit btime field acct: stop using get_seconds() um: ubd: use 64-bit time_t where possible xtensa: ISS: avoid struct timeval dlm: use SO_SNDTIMEO_NEW instead of SO_SNDTIMEO_OLD ...	2020-01-29 14:55:47 -08:00
Kadlecsik József	5038517119	netfilter: ipset: fix suspicious RCU usage in find_set_and_id find_set_and_id() is called when the NFNL_SUBSYS_IPSET mutex is held. However, in the error path there can be a follow-up recvmsg() without the mutex held. Use the start() function of struct netlink_dump_control instead of dump() to verify and report if the specified set does not exist. Thanks to Pablo Neira Ayuso for helping me to understand the subleties of the netlink protocol. Reported-by: syzbot+fc69d7cb21258ab4ae4d@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-01-29 18:34:46 +01:00
Florian Westphal	ae2dd71649	mptcp: handle tcp fallback when using syn cookies We can't deal with syncookie mode yet, the syncookie rx path will create tcp reqsk, i.e. we get OOB access because we treat tcp reqsk as mptcp reqsk one: TCP: SYN flooding on port 20002. Sending cookies. BUG: KASAN: slab-out-of-bounds in subflow_syn_recv_sock+0x451/0x4d0 net/mptcp/subflow.c:191 Read of size 1 at addr ffff8881167bc148 by task syz-executor099/2120 subflow_syn_recv_sock+0x451/0x4d0 net/mptcp/subflow.c:191 tcp_get_cookie_sock+0xcf/0x520 net/ipv4/syncookies.c:209 cookie_v6_check+0x15a5/0x1e90 net/ipv6/syncookies.c:252 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1123 [inline] [..] Bug can be reproduced via "sysctl net.ipv4.tcp_syncookies=2". Note that MPTCP should work with syncookies (4th ack would carry needed state), but it appears better to sort that out in -next so do tcp fallback for now. I removed the MPTCP ifdef for tcp_rsk "is_mptcp" member because if (IS_ENABLED()) is easier to read than "#ifdef IS_ENABLED()/#endif" pair. Cc: Eric Dumazet <edumazet@google.com> Fixes: `cec37a6e41` ("mptcp: Handle MP_CAPABLE options for outgoing connections") Reported-by: Christoph Paasch <cpaasch@apple.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-29 17:45:20 +01:00
Florian Westphal	b2c5b614ca	mptcp: avoid a lockdep splat when mcast group was joined syzbot triggered following lockdep splat: ffffffff82d2cd40 (rtnl_mutex){+.+.}, at: ip_mc_drop_socket+0x52/0x180 but task is already holding lock: ffff8881187a2310 (sk_lock-AF_INET){+.+.}, at: mptcp_close+0x18/0x30 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (sk_lock-AF_INET){+.+.}: lock_acquire+0xee/0x230 lock_sock_nested+0x89/0xc0 do_ip_setsockopt.isra.0+0x335/0x22f0 ip_setsockopt+0x35/0x60 tcp_setsockopt+0x5d/0x90 __sys_setsockopt+0xf3/0x190 __x64_sys_setsockopt+0x61/0x70 do_syscall_64+0x72/0x300 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (rtnl_mutex){+.+.}: check_prevs_add+0x2b7/0x1210 __lock_acquire+0x10b6/0x1400 lock_acquire+0xee/0x230 __mutex_lock+0x120/0xc70 ip_mc_drop_socket+0x52/0x180 inet_release+0x36/0xe0 __sock_release+0xfd/0x130 __mptcp_close+0xa8/0x1f0 inet_release+0x7f/0xe0 __sock_release+0x69/0x130 sock_close+0x18/0x20 __fput+0x179/0x400 task_work_run+0xd5/0x110 do_exit+0x685/0x1510 do_group_exit+0x7e/0x170 __x64_sys_exit_group+0x28/0x30 do_syscall_64+0x72/0x300 entry_SYSCALL_64_after_hwframe+0x49/0xbe The trigger is: socket(AF_INET, SOCK_STREAM, 0x106 /* IPPROTO_MPTCP */) = 4 setsockopt(4, SOL_IP, MCAST_JOIN_GROUP, {gr_interface=7, gr_group={sa_family=AF_INET, sin_port=htons(20003), sin_addr=inet_addr("224.0.0.2")}}, 136) = 0 exit(0) Which results in a call to rtnl_lock while we are holding the parent mptcp socket lock via mptcp_close -> lock_sock(msk) -> inet_release -> ip_mc_drop_socket -> rtnl_lock(). >From lockdep point of view we thus have both 'rtnl_lock; lock_sock' and 'lock_sock; rtnl_lock'. Fix this by stealing the msk conn_list and doing the subflow close without holding the msk lock. Fixes: `cec37a6e41` ("mptcp: Handle MP_CAPABLE options for outgoing connections") Reported-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-29 17:45:20 +01:00
Florian Westphal	50e741bb3b	mptcp: fix panic on user pointer access Its not possible to call the kernel_(s\|g)etsockopt functions here, the address points to user memory: General protection fault in user access. Non-canonical address? WARNING: CPU: 1 PID: 5352 at arch/x86/mm/extable.c:77 ex_handler_uaccess+0xba/0xe0 arch/x86/mm/extable.c:77 Kernel panic - not syncing: panic_on_warn set ... [..] Call Trace: fixup_exception+0x9d/0xcd arch/x86/mm/extable.c:178 general_protection+0x2d/0x40 arch/x86/entry/entry_64.S:1202 do_ip_getsockopt+0x1f6/0x1860 net/ipv4/ip_sockglue.c:1323 ip_getsockopt+0x87/0x1c0 net/ipv4/ip_sockglue.c:1561 tcp_getsockopt net/ipv4/tcp.c:3691 [inline] tcp_getsockopt+0x8c/0xd0 net/ipv4/tcp.c:3685 kernel_getsockopt+0x121/0x1f0 net/socket.c:3736 mptcp_getsockopt+0x69/0x90 net/mptcp/protocol.c:830 __sys_getsockopt+0x13a/0x220 net/socket.c:2175 We can call tcp_get/setsockopt functions instead. Doing so fixes crashing, but still leaves rtnl related lockdep splat: WARNING: possible circular locking dependency detected 5.5.0-rc6 #2 Not tainted ------------------------------------------------------ syz-executor.0/16334 is trying to acquire lock: ffffffff84f7a080 (rtnl_mutex){+.+.}, at: do_ip_setsockopt.isra.0+0x277/0x3820 net/ipv4/ip_sockglue.c:644 but task is already holding lock: ffff888116503b90 (sk_lock-AF_INET){+.+.}, at: lock_sock include/net/sock.h:1516 [inline] ffff888116503b90 (sk_lock-AF_INET){+.+.}, at: mptcp_setsockopt+0x28/0x90 net/mptcp/protocol.c:1284 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (sk_lock-AF_INET){+.+.}: lock_sock_nested+0xca/0x120 net/core/sock.c:2944 lock_sock include/net/sock.h:1516 [inline] do_ip_setsockopt.isra.0+0x281/0x3820 net/ipv4/ip_sockglue.c:645 ip_setsockopt+0x44/0xf0 net/ipv4/ip_sockglue.c:1248 udp_setsockopt+0x5d/0xa0 net/ipv4/udp.c:2639 __sys_setsockopt+0x152/0x240 net/socket.c:2130 __do_sys_setsockopt net/socket.c:2146 [inline] __se_sys_setsockopt net/socket.c:2143 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2143 do_syscall_64+0xbd/0x5b0 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (rtnl_mutex){+.+.}: check_prev_add kernel/locking/lockdep.c:2475 [inline] check_prevs_add kernel/locking/lockdep.c:2580 [inline] validate_chain kernel/locking/lockdep.c:2970 [inline] __lock_acquire+0x1fb2/0x4680 kernel/locking/lockdep.c:3954 lock_acquire+0x127/0x330 kernel/locking/lockdep.c:4484 __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x158/0x1340 kernel/locking/mutex.c:1103 do_ip_setsockopt.isra.0+0x277/0x3820 net/ipv4/ip_sockglue.c:644 ip_setsockopt+0x44/0xf0 net/ipv4/ip_sockglue.c:1248 tcp_setsockopt net/ipv4/tcp.c:3159 [inline] tcp_setsockopt+0x8c/0xd0 net/ipv4/tcp.c:3153 kernel_setsockopt+0x121/0x1f0 net/socket.c:3767 mptcp_setsockopt+0x69/0x90 net/mptcp/protocol.c:1288 __sys_setsockopt+0x152/0x240 net/socket.c:2130 __do_sys_setsockopt net/socket.c:2146 [inline] __se_sys_setsockopt net/socket.c:2143 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2143 do_syscall_64+0xbd/0x5b0 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sk_lock-AF_INET); lock(rtnl_mutex); lock(sk_lock-AF_INET); lock(rtnl_mutex); The lockdep complaint is because we hold mptcp socket lock when calling the sk_prot get/setsockopt handler, and those might need to acquire the rtnl mutex. Normally, order is: rtnl_lock(sk) -> lock_sock Whereas for mptcp the order is lock_sock(mptcp_sk) rtnl_lock -> lock_sock(subflow_sk) We can avoid this by releasing the mptcp socket lock early, but, as Paolo points out, we need to get/put the subflow socket refcount before doing so to avoid race with concurrent close(). Fixes: `717e79c867` ("mptcp: Add setsockopt()/getsockopt() socket operations") Reported-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-29 17:45:19 +01:00
Florian Westphal	c9fd9c5f4b	mptcp: defer freeing of cached ext until last moment access to msk->cached_ext is only legal if the msk is locked or all concurrent accesses are impossible. Furthermore, once we start to tear down, we must make sure nothing else can step in and allocate a new cached ext. So place this code in the destroy callback where it belongs. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-29 17:45:19 +01:00
Joe Perches	793da4bfba	sch_choke: Use kvcalloc Convert the use of kvmalloc_array with __GFP_ZERO to the equivalent kvcalloc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-29 11:58:10 +01:00
David S. Miller	f6f7d8cf55	mptcp: Fix build with PROC_FS disabled. net/mptcp/subflow.c: In function ‘mptcp_subflow_create_socket’: net/mptcp/subflow.c:624:25: error: ‘struct netns_core’ has no member named ‘sock_inuse’ Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-29 10:39:23 +01:00
Linus Torvalds	bd2463ac7d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from David Miller: 1) Add WireGuard 2) Add HE and TWT support to ath11k driver, from John Crispin. 3) Add ESP in TCP encapsulation support, from Sabrina Dubroca. 4) Add variable window congestion control to TIPC, from Jon Maloy. 5) Add BCM84881 PHY driver, from Russell King. 6) Start adding netlink support for ethtool operations, from Michal Kubecek. 7) Add XDP drop and TX action support to ena driver, from Sameeh Jubran. 8) Add new ipv4 route notifications so that mlxsw driver does not have to handle identical routes itself. From Ido Schimmel. 9) Add BPF dynamic program extensions, from Alexei Starovoitov. 10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes. 11) Add support for macsec HW offloading, from Antoine Tenart. 12) Add initial support for MPTCP protocol, from Christoph Paasch, Matthieu Baerts, Florian Westphal, Peter Krystad, and many others. 13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu Cherian, and others. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits) net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC udp: segment looped gso packets correctly netem: change mailing list qed: FW 8.42.2.0 debug features qed: rt init valid initialization changed qed: Debug feature: ilt and mdump qed: FW 8.42.2.0 Add fw overlay feature qed: FW 8.42.2.0 HSI changes qed: FW 8.42.2.0 iscsi/fcoe changes qed: Add abstraction for different hsi values per chip qed: FW 8.42.2.0 Additional ll2 type qed: Use dmae to write to widebus registers in fw_funcs qed: FW 8.42.2.0 Parser offsets modified qed: FW 8.42.2.0 Queue Manager changes qed: FW 8.42.2.0 Expose new registers and change windows qed: FW 8.42.2.0 Internal ram offsets modifications MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver Documentation: net: octeontx2: Add RVU HW and drivers overview octeontx2-pf: ethtool RSS config support octeontx2-pf: Add basic ethtool support ...	2020-01-28 16:02:33 -08:00
Linus Torvalds	c677124e63	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "These were the main changes in this cycle: - More -rt motivated separation of CONFIG_PREEMPT and CONFIG_PREEMPTION. - Add more low level scheduling topology sanity checks and warnings to filter out nonsensical topologies that break scheduling. - Extend uclamp constraints to influence wakeup CPU placement - Make the RT scheduler more aware of asymmetric topologies and CPU capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y - Make idle CPU selection more consistent - Various fixes, smaller cleanups, updates and enhancements - please see the git log for details" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) sched/fair: Define sched_idle_cpu() only for SMP configurations sched/topology: Assert non-NUMA topology masks don't (partially) overlap idle: fix spelling mistake "iterrupts" -> "interrupts" sched/fair: Remove redundant call to cpufreq_update_util() sched/psi: create /proc/pressure and /proc/pressure/{io\|memory\|cpu} only when psi enabled sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP sched/fair: calculate delta runnable load only when it's needed sched/cputime: move rq parameter in irqtime_account_process_tick stop_machine: Make stop_cpus() static sched/debug: Reset watchdog on all CPUs while processing sysrq-t sched/core: Fix size of rq::uclamp initialization sched/uclamp: Fix a bug in propagating uclamp value in new cgroups sched/fair: Load balance aggressively for SCHED_IDLE CPUs sched/fair : Improve update_sd_pick_busiest for spare capacity case watchdog: Remove soft_lockup_hrtimer_cnt and related code sched/rt: Make RT capacity-aware sched/fair: Make EAS wakeup placement consider uclamp restrictions sched/fair: Make task_fits_capacity() consider uclamp restrictions sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with() sched/uclamp: Make uclamp util helpers use and return UL values ...	2020-01-28 10:07:09 -08:00
Linus Torvalds	c0e809e244	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side changes: - Ftrace is one of the last W^X violators (after this only KLP is left). These patches move it over to the generic text_poke() interface and thereby get rid of this oddity. This requires a surprising amount of surgery, by Peter Zijlstra. - x86/AMD PMUs: add support for 'Large Increment per Cycle Events' to count certain types of events that have a special, quirky hw ABI (by Kim Phillips) - kprobes fixes by Masami Hiramatsu Lots of tooling updates as well, the following subcommands were updated: annotate/report/top, c2c, clang, record, report/top TUI, sched timehist, tests; plus updates were done to the gtk ui, libperf, headers and the parser" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) perf/x86/amd: Add support for Large Increment per Cycle Events perf/x86/amd: Constrain Large Increment per Cycle events perf/x86/intel/rapl: Add Comet Lake support tracing: Initialize ret in syscall_enter_define_fields() perf header: Use last modification time for timestamp perf c2c: Fix return type for histogram sorting comparision functions perf beauty sockaddr: Fix augmented syscall format warning perf/ui/gtk: Fix gtk2 build perf ui gtk: Add missing zalloc object perf tools: Use %define api.pure full instead of %pure-parser libperf: Setup initial evlist::all_cpus value perf report: Fix no libunwind compiled warning break s390 issue perf tools: Support --prefix/--prefix-strip perf report: Clarify in help that --children is default tools build: Fix test-clang.cpp with Clang 8+ perf clang: Fix build with Clang 9 kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic tools lib: Fix builds when glibc contains strlcpy() perf report/top: Make 'e' visible in the help and make it toggle showing callchains perf report/top: Do not offer annotation for symbols without samples ...	2020-01-28 09:44:15 -08:00
Linus Torvalds	d99391ec2b	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The RCU changes in this cycle were: - Expedited grace-period updates - kfree_rcu() updates - RCU list updates - Preemptible RCU updates - Torture-test updates - Miscellaneous fixes - Documentation updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits) rcu: Remove unused stop-machine #include powerpc: Remove comment about read_barrier_depends() .mailmap: Add entries for old paulmck@kernel.org addresses srcu: Apply *_ONCE() to ->srcu_last_gp_end rcu: Switch force_qs_rnp() to for_each_leaf_node_cpu_mask() rcu: Move rcu_{expedited,normal} definitions into rcupdate.h rcu: Move gp_state_names[] and gp_state_getname() to tree_stall.h rcu: Remove the declaration of call_rcu() in tree.h rcu: Fix tracepoint tracking RCU CPU kthread utilization rcu: Fix harmless omission of "CONFIG_" from #if condition rcu: Avoid tick_dep_set_cpu() misordering rcu: Provide wrappers for uses of ->rcu_read_lock_nesting rcu: Use READ_ONCE() for ->expmask in rcu_read_unlock_special() rcu: Clear ->rcu_read_unlock_special only once rcu: Clear .exp_hint only when deferred quiescent state has been reported rcu: Rename some instance of CONFIG_PREEMPTION to CONFIG_PREEMPT_RCU rcu: Remove kfree_call_rcu_nobatch() rcu: Remove kfree_rcu() special casing and lazy-callback handling rcu: Add support for debug_objects debugging for kfree_rcu() rcu: Add multiple in-flight batches of kfree_rcu() work ...	2020-01-28 08:46:13 -08:00
David S. Miller	9e0703a265	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2020-01-27 The following pull-request contains BPF updates for your net-next tree. We've added 20 non-merge commits during the last 5 day(s) which contain a total of 24 files changed, 433 insertions(+), 104 deletions(-). The main changes are: 1) Make BPF trampolines and dispatcher aware for the stack unwinder, from Jiri Olsa. 2) Improve handling of failed CO-RE relocations in libbpf, from Andrii Nakryiko. 3) Several fixes to BPF sockmap and reuseport selftests, from Lorenz Bauer. 4) Various cleanups in BPF devmap's XDP flush code, from John Fastabend. 5) Fix BPF flow dissector when used with port ranges, from Yoshiki Komachi. 6) Fix bpffs' map_seq_next callback to always inc position index, from Vasily Averin. 7) Allow overriding LLVM tooling for runqslower utility, from Andrey Ignatov. 8) Silence false-positive lockdep splats in devmap hash lookup, from Amol Grover. 9) Fix fentry/fexit selftests to initialize a variable before use, from John Sperbeck. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-27 14:31:40 +01:00
David S. Miller	c312840cd7	Revert "pktgen: Allow configuration of IPv6 source address range" This reverts commit `7786a1af2a`. It causes build failures on 32-bit, for example: net/core/pktgen.o: In function `mod_cur_headers': >> pktgen.c:(.text.mod_cur_headers+0xba0): undefined reference to `__umoddi3' Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-27 13:49:33 +01:00
Leon Romanovsky	6a7e25c7fb	net/core: Replace driver version to be kernel version In order to stop useless driver version bumps and unify output presented by ethtool -i, let's set default version string. As Linus said in [1]: "Things are supposed to be backwards and forwards compatible, because we don't accept breakage in user space anyway. So versioning is pointless, and only causes problems." They cause problems when users start to see version changes and expect specific set of features which will be different for stable@, vanilla and distribution kernels. Distribution kernels are based on some kernel version with extra patches on top, for example, in RedHat world this "extra" is a lot and for them your driver version say nothing. Users who run vanilla kernels won't use driver version information too, because running such kernels requires knowledge and understanding. Another set of problems are related to difference in versioning scheme and such doesn't allow to write meaningful automation which will work sanely on all ethtool capable devices. Before this change: [leonro@erver ~]$ ethtool -i eth0 driver: virtio_net version: 1.0.0 After this change and once ->version assignment will be deleted from virtio_net: [leonro@server ~]$ ethtool -i eth0 driver: virtio_net version: 5.5.0-rc6+ Link: https://lore.kernel.org/ksummit-discuss/CA+55aFx9A=5cc0QZ7CySC4F2K7eYaEfzkdYEc9JaNgCcV25=rg@mail.gmail.com/ Link: https://lore.kernel.org/linux-rdma/20200122152627.14903-1-michal.kalderon@marvell.com/T/#md460ff8f976c532a89d6860411c3c50bb811038b Link: https://lore.kernel.org/linux-rdma/20200127060835.GA570@unicorn.suse.cz Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-27 13:47:22 +01:00
Michal Kubecek	67bffa7923	ethtool: add WOL_NTF notification Send ETHTOOL_MSG_WOL_NTF notification whenever wake-on-lan settings of a device are modified using ETHTOOL_MSG_WOL_SET netlink message or ETHTOOL_SWOL ioctl request. As notifications can be received by anyone, do not include SecureOn(tm) password in notification messages. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-27 11:31:36 +01:00
Michal Kubecek	8d425b19b3	ethtool: set wake-on-lan settings with WOL_SET request Implement WOL_SET netlink request to set wake-on-lan settings. This is equivalent to ETHTOOL_SWOL ioctl request. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-27 11:31:36 +01:00
Michal Kubecek	51ea22b04e	ethtool: provide WoL settings with WOL_GET request Implement WOL_GET request to get wake-on-lan settings for a device, traditionally available via ETHTOOL_GWOL ioctl request. As part of the implementation, provide symbolic names for wake-on-line modes as ETH_SS_WOL_MODES string set. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-27 11:31:36 +01:00
Michal Kubecek	0bda7af39d	ethtool: add DEBUG_NTF notification Send ETHTOOL_MSG_DEBUG_NTF notification message whenever debugging message mask for a device are modified using ETHTOOL_MSG_DEBUG_SET netlink message or ETHTOOL_SMSGLVL ioctl request. The notification message has the same format as reply to DEBUG_GET request. As with other ethtool notifications, netlink requests only trigger the notification if the mask is actually changed while ioctl request trigger it whenever the request results in calling the ethtool_ops handler. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-27 11:31:36 +01:00
Michal Kubecek	e54d04e3af	ethtool: set message mask with DEBUG_SET request Implement DEBUG_SET netlink request to set debugging settings for a device. At the moment, only message mask corresponding to message level as set by ETHTOOL_SMSGLVL ioctl request can be set. (It is called message level in ioctl interface but almost all drivers interpret it as a bit mask.) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-27 11:31:35 +01:00
Michal Kubecek	6a94b8ccf6	ethtool: provide message mask with DEBUG_GET request Implement DEBUG_GET request to get debugging settings for a device. At the moment, only message mask corresponding to message level as reported by ETHTOOL_GMSGLVL ioctl request is provided. (It is called message level in ioctl interface but almost all drivers interpret it as a bit mask.) As part of the implementation, provide symbolic names for message mask bits as ETH_SS_MSG_CLASSES string set. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-27 11:31:35 +01:00
Michal Kubecek	d2c4b444fd	ethtool: fix kernel-doc descriptions Fix missing or incorrect function argument and struct member descriptions. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-27 11:31:35 +01:00
Yoshiki Komachi	59fb9b62fb	flow_dissector: Fix to use new variables for port ranges in bpf hook This patch applies new flag (FLOW_DISSECTOR_KEY_PORTS_RANGE) and field (tp_range) to BPF flow dissector to generate appropriate flow keys when classified by specified port ranges. Fixes: `8ffb055bea` ("cls_flower: Fix the behavior using port ranges with hw-offload") Signed-off-by: Yoshiki Komachi <komachi.yoshiki@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Petar Penkov <ppenkov@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200117070533.402240-2-komachi.yoshiki@gmail.com	2020-01-27 11:25:07 +01:00
David S. Miller	c4c57b974d	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2020-01-26 Here's (probably) the last bluetooth-next pull request for the 5.6 kernel. - Initial pieces of Bluetooth 5.2 Isochronous Channels support - mgmt: Various cleanups and a new Set Blocked Keys command - btusb: Added support for 04ca:3021 QCA_ROME device - hci_qca: Multiple fixes & cleanups - hci_bcm: Fixes & improved device tree support - Fixed attempts to create duplicate debugfs entries Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-27 11:24:46 +01:00
Vladimir Oltean	6dc43cd3aa	net: dsa: Fix use-after-free in probing of DSA switch tree DSA sets up a switch tree little by little. Every switch of the N members of the tree calls dsa_register_switch, and (N - 1) will just touch the dst->ports list with their ports and quickly exit. Only the last switch that calls dsa_register_switch will find all DSA links complete in dsa_tree_setup_routing_table, and not return zero as a result but instead go ahead and set up the entire DSA switch tree (practically on behalf of the other switches too). The trouble is that the (N - 1) switches don't clean up after themselves after they get an error such as EPROBE_DEFER. Their footprint left in dst->ports by dsa_switch_touch_ports is still there. And switch N, the one responsible with actually setting up the tree, is going to work with those stale dp, dp->ds and dp->ds->dev pointers. In particular ds and ds->dev might get freed by the device driver. Be there a 2-switch tree and the following calling order: - Switch 1 calls dsa_register_switch - Calls dsa_switch_touch_ports, populates dst->ports - Calls dsa_port_parse_cpu, gets -EPROBE_DEFER, exits. - Switch 2 calls dsa_register_switch - Calls dsa_switch_touch_ports, populates dst->ports - Probe doesn't get deferred, so it goes ahead. - Calls dsa_tree_setup_routing_table, which returns "complete == true" due to Switch 1 having called dsa_switch_touch_ports before. - Because the DSA links are complete, it calls dsa_tree_setup_switches now. - dsa_tree_setup_switches iterates through dst->ports, initializing the Switch 1 ds structure (invalid) and the Switch 2 ds structure (valid). - Undefined behavior (use after free, sometimes NULL pointers, etc). Real example below (debugging prints added by me, as well as guards against NULL pointers): [ 5.477947] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.313002] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.319932] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.329693] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.339458] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.349226] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.358991] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.368758] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.378524] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.388291] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.398057] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.407912] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.417682] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.427446] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.437212] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.446979] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.456744] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.466512] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.476277] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.486043] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.495810] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.505577] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.515433] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.354120] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.361045] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.370805] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.380571] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.390337] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.400104] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.409872] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.419637] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.429403] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.439169] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803db15b80 (dev ffffff803d8e4800) The solution is to recognize that the functions that call dsa_switch_touch_ports (dsa_switch_parse_of, dsa_switch_parse) have side effects, and therefore one should clean up their side effects on error path. The cleanup of dst->ports was taken from dsa_switch_remove and moved into a dedicated dsa_switch_release_ports function, which should really be per-switch (free only the members of dst->ports that are also members of ds, instead of all switch ports). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-27 11:12:46 +01:00

1 2 3 4 5 ...

59057 Commits