linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-26 14:15:12 +07:00

Author	SHA1	Message	Date
Pablo Neira Ayuso	c29f74e0df	netfilter: nf_flow_table: hardware offload support This patch adds the dataplane hardware offload to the flowtable infrastructure. Three new flags represent the hardware state of this flow: * FLOW_OFFLOAD_HW: This flow entry resides in the hardware. * FLOW_OFFLOAD_HW_DYING: This flow entry has been scheduled to be remove from hardware. This might be triggered by either packet path (via TCP RST/FIN packet) or via aging. * FLOW_OFFLOAD_HW_DEAD: This flow entry has been already removed from the hardware, the software garbage collector can remove it from the software flowtable. This patch supports for: * IPv4 only. * Aging via FLOW_CLS_STATS, no packet and byte counter synchronization at this stage. This patch also adds the action callback that specifies how to convert the flow entry into the flow_rule object that is passed to the driver. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-12 19:42:26 -08:00
David S. Miller	e0580b50d9	linux-can-next-for-5.5-20191111 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEmvEkXzgOfc881GuFWsYho5HknSAFAl3J1ekTHG1rbEBwZW5n dXRyb25peC5kZQAKCRBaxiGjkeSdILUKB/9mYalSK5f4n/AalnsEoGy3IWVy/xwG KvAdZAueH2uFIhu6MfqmWbs9xp94kGsI9RP1o3UDzwLxBaol8cG97oi0T1Fn5670 LSzBkR5WXEVaerw7noGwVEvi/IKkSCO7pOI3PszwITHU+P/w/EChVLyeWnzDoTaf bzSXbtXQTOCINepIic+f1COffyJA4o84uiEFwpkuTJPbiihxr43bGrwj0xb1cWC9 Zh464vGuA7s2/Xqhsmg/Z/BU5ihWIz3/wEAqKUW99dj1rUpgdaChTFEO1femXInX WkOi6r4EZGsAB0Cbbp6hTkovK9UF94asVMOW74Yz9UK9dxEakcwmR7XK =S7hM -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-5.5-20191111' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2019-10-07 this is a pull request for net-next/master consisting of 32 patches. The first patch is by Gustavo A. R. Silva and removes unused code in the generic CAN infrastructure. The next three patches target the mcp251x driver. The one by Andy Shevchenko removes the legacy platform data support from the driver. The other two are by Timo Schlüßler and reset the device only when needed, to prevent glitches on the output when GPIO support is added. I'm contributing two patches fixing checkpatch warnings in the c_can_platform and peak_canfd driver. Stephane Grosjean's patch for the peak_canfd driver adds hw timestamps support in rx skbs. The next three patches target the xilinx_can driver. One patch by me to fix checkpatch warnings, one patch by Anssi Hannula to avoid non requested bus error frames, and a patch by YueHaibing that switches the driver to devm_platform_ioremap_resource(). Pankaj Sharma contributes two patches for the m_can driver, the first one adds support for one shot mode, the other support for handling arbitration errors. Followed by four patches by YueHaibing, switching the grcan, ifi, rcar, and sun4i drivers to devm_platform_ioremap_resource() I'm contributing cleanup patches for the rx-offload helper, while Joakim Zhang's patch prepares the rx-offload helper for CAN-FD support. The rx offload users flexcan and ti_hecc are converted accordingly. The remaining twelve patches target the flexcan driver. First Joakim Zhang switches the driver to devm_platform_ioremap_resource(). The remaining eleven patch are by me and clean up the abstract the access of the iflag1 and iflag2 register both for RX and TX mailboxes. This is a preparation for the upcoming CAN-FD support. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-12 12:27:05 -08:00
Russell King	6c08670223	net: sfp: fix sfp_bus_add_upstream() warning When building with SFP disabled, the stub for sfp_bus_add_upstream() missed "inline". Add it. Fixes: `727b3668b7` ("net: sfp: rework upstream interface") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-12 11:13:41 -08:00
Joakim Zhang	4e9c9484b0	can: rx-offload: Prepare for CAN FD support The skbs for classic CAN and CAN FD frames are allocated with seperate functions: alloc_can_skb() and alloc_canfd_skb(). In order to support CAN FD frames via the rx-offload helper, the driver itself has to allocate the skb (depending whether it received a classic CAN or CAN FD frame), as the rx-offload helper cannot know which kind of CAN frame the driver has received. This patch moves the allocation of the skb into the struct can_rx_offload::mailbox_read callbacks of the the flexcan and ti_hecc driver and adjusts the rx-offload helper accordingly. Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-11-11 21:58:10 +01:00
Marc Kleine-Budde	61d2350615	can: rx-offload: can_rx_offload_reset(): remove no-op function This patch removes the function can_rx_offload_reset(), as it does nothing. If we ever need this function, add it back again. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-11-11 21:58:10 +01:00
Andy Shevchenko	50ec88120e	can: mcp251x: get rid of legacy platform data Instead of using legacy platform data, switch to use device properties. For clock frequency we are using well established clock-frequency property. Users, two for now, are also converted here. Cc: Daniel Mack <daniel@zonque.org> Cc: Haojian Zhuang <haojian.zhuang@gmail.com> Cc: Robert Jarzmik <robert.jarzmik@free.fr> Cc: Russell King <linux@armlinux.org.uk> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-11-11 21:57:28 +01:00
Russell King	727b3668b7	net: sfp: rework upstream interface The current upstream interface is an all-or-nothing, which is sub-optimal for future changes, as it doesn't allow the upstream driver to prepare for the SFP module becoming available, as it is at boot. Switch to a find-sfp-bus, add-upstream, del-upstream, put-sfp-bus interface structure instead, which allows the upstream driver to prepare for a module being available as soon as add-upstream is called. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-09 19:35:53 -08:00
David S. Miller	14684b9301	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net One conflict in the BPF samples Makefile, some fixes in 'net' whilst we were converting over to Makefile.target rules in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-09 11:04:37 -08:00
Linus Torvalds	0058b0a506	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) BPF sample build fixes from Björn Töpel 2) Fix powerpc bpf tail call implementation, from Eric Dumazet. 3) DCCP leaks jiffies on the wire, fix also from Eric Dumazet. 4) Fix crash in ebtables when using dnat target, from Florian Westphal. 5) Fix port disable handling whne removing bcm_sf2 driver, from Florian Fainelli. 6) Fix kTLS sk_msg trim on fallback to copy mode, from Jakub Kicinski. 7) Various KCSAN fixes all over the networking, from Eric Dumazet. 8) Memory leaks in mlx5 driver, from Alex Vesker. 9) SMC interface refcounting fix, from Ursula Braun. 10) TSO descriptor handling fixes in stmmac driver, from Jose Abreu. 11) Add a TX lock to synchonize the kTLS TX path properly with crypto operations. From Jakub Kicinski. 12) Sock refcount during shutdown fix in vsock/virtio code, from Stefano Garzarella. 13) Infinite loop in Intel ice driver, from Colin Ian King. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (108 commits) ixgbe: need_wakeup flag might not be set for Tx i40e: need_wakeup flag might not be set for Tx igb/igc: use ktime accessors for skb->tstamp i40e: Fix for ethtool -m issue on X722 NIC iavf: initialize ITRN registers with correct values ice: fix potential infinite loop because loop counter being too small qede: fix NULL pointer deref in __qede_remove() net: fix data-race in neigh_event_send() vsock/virtio: fix sock refcnt holding during the shutdown net: ethernet: octeon_mgmt: Account for second possible VLAN header mac80211: fix station inactive_time shortly after boot net/fq_impl: Switch to kvmalloc() for memory allocation mac80211: fix ieee80211_txq_setup_flows() failure path ipv4: Fix table id reference in fib_sync_down_addr ipv6: fixes rt6_probe() and fib6_nh->last_probe init net: hns: Fix the stray netpoll locks causing deadlock in NAPI path net: usb: qmi_wwan: add support for DW5821e with eSIM support CDC-NCM: handle incomplete transfer of MTU nfc: netlink: fix double device reference drop NFC: st21nfca: fix double free ...	2019-11-08 18:21:05 -08:00
Linus Torvalds	410ef736a7	XArray updates for 5.4 These patches all fix various bugs, some of which people have tripped over and some of which have been caught by automatic tools. Matthew Wilcox (Oracle) (5): XArray: Fix xas_next() with a single entry at 0 idr: Fix idr_get_next_ul race with idr_remove radix tree: Remove radix_tree_iter_find idr: Fix integer overflow in idr_for_each_entry idr: Fix idr_alloc_u32 on 32-bit systems -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAl3E4tgACgkQDpNsjXcp gj7YjQf6ArvGGHp3U+w1TRA4KCIrtUdGY4nceDQSYaJ2IVus+fHDQnwMCJb5Rjzw 3aFZKLrrsaWWGKTqqRDKD4zm6I6Mg1239WNCnJ8VQrSRepNQ7WxVXGFn560NDZ5b u7zYXBm3CtlJpkX9JVbokii4LkjuwXzbuSh6cv+X3APBUQ3JXuGBmT7p2PLp0ol9 lNKUrxZCK+CJ7kJo5W81lCzZc6GY2USqwmuqudGACWMm1K24TRL52PeD8NU6IzKc Mw9c7Osa0TlwjSaxObaRgLYzIZQoNbkrMTg0xNr8GZjJIn/yJIxqOBb4k3mZWQF1 5KmLfpLotItt25MP8jxgx+1N03jjvw== =h6eN -----END PGP SIGNATURE----- Merge tag 'xarray-5.4' of git://git.infradead.org/users/willy/linux-dax Pull XArray fixes from Matthew Wilcox: "These all fix various bugs, some of which people have tripped over and some of which have been caught by automatic tools" * tag 'xarray-5.4' of git://git.infradead.org/users/willy/linux-dax: idr: Fix idr_alloc_u32 on 32-bit systems idr: Fix integer overflow in idr_for_each_entry radix tree: Remove radix_tree_iter_find idr: Fix idr_get_next_ul race with idr_remove XArray: Fix xas_next() with a single entry at 0	2019-11-08 08:46:49 -08:00
Eric Dumazet	f8cc62ca3e	net: add a READ_ONCE() in skb_peek_tail() skb_peek_tail() can be used without protection of a lock, as spotted by KCSAN [1] In order to avoid load-stearing, add a READ_ONCE() Note that the corresponding WRITE_ONCE() are already there. [1] BUG: KCSAN: data-race in sk_wait_data / skb_queue_tail read to 0xffff8880b36a4118 of 8 bytes by task 20426 on cpu 1: skb_peek_tail include/linux/skbuff.h:1784 [inline] sk_wait_data+0x15b/0x250 net/core/sock.c:2477 kcm_wait_data+0x112/0x1f0 net/kcm/kcmsock.c:1103 kcm_recvmsg+0xac/0x320 net/kcm/kcmsock.c:1130 sock_recvmsg_nosec net/socket.c:871 [inline] sock_recvmsg net/socket.c:889 [inline] sock_recvmsg+0x92/0xb0 net/socket.c:885 ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480 do_recvmmsg+0x19a/0x5c0 net/socket.c:2601 __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680 __do_sys_recvmmsg net/socket.c:2703 [inline] __se_sys_recvmmsg net/socket.c:2696 [inline] __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x44/0xa9 write to 0xffff8880b36a4118 of 8 bytes by task 451 on cpu 0: __skb_insert include/linux/skbuff.h:1852 [inline] __skb_queue_before include/linux/skbuff.h:1958 [inline] __skb_queue_tail include/linux/skbuff.h:1991 [inline] skb_queue_tail+0x7e/0xc0 net/core/skbuff.c:3145 kcm_queue_rcv_skb+0x202/0x310 net/kcm/kcmsock.c:206 kcm_rcv_strparser+0x74/0x4b0 net/kcm/kcmsock.c:370 __strp_recv+0x348/0xf50 net/strparser/strparser.c:309 strp_recv+0x84/0xa0 net/strparser/strparser.c:343 tcp_read_sock+0x174/0x5c0 net/ipv4/tcp.c:1639 strp_read_sock+0xd4/0x140 net/strparser/strparser.c:366 do_strp_work net/strparser/strparser.c:414 [inline] strp_work+0x9a/0xe0 net/strparser/strparser.c:423 process_one_work+0x3d4/0x890 kernel/workqueue.c:2269 worker_thread+0xa0/0x800 kernel/workqueue.c:2415 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 451 Comm: kworker/u4:3 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: kstrp strp_work Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-07 20:08:14 -08:00
Eric Dumazet	fd2f473787	net: use u64_stats_t in struct pcpu_lstats In order to fix the data-race found by KCSAN, we can use the new u64_stats_t type and its accessors instead of plain u64 fields. This will still generate optimal code for both 32 and 64 bit platforms. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-07 20:03:08 -08:00
Eric Dumazet	316580b69d	u64_stats: provide u64_stats_t type On 64bit arches, struct u64_stats_sync is empty and provides no help against load/store tearing. Using READ_ONCE()/WRITE_ONCE() would be needed. But the update side would be slightly more expensive. local64_t was defined so that we could use regular adds in a manner which is atomic wrt IRQs. However the u64_stats infra means we do not have to use local64_t on 32bit arches since the syncp provides the needed protection. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-07 20:03:08 -08:00
Eric Dumazet	dd5382a081	net: provide dev_lstats_add() helper Many network drivers need it and hand-coded the same function. In order to ease u64_stats_t adoption, it is time to factorize. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-07 20:03:08 -08:00
Eric Dumazet	de7d5084d8	net: provide dev_lstats_read() helper Many network drivers use hand-coded implementation of the same thing, let's factorize things so that u64_stats_t adoption is done once. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-07 20:03:08 -08:00
Yang Shi	169226f7e0	mm: thp: handle page cache THP correctly in PageTransCompoundMap We have a usecase to use tmpfs as QEMU memory backend and we would like to take the advantage of THP as well. But, our test shows the EPT is not PMD mapped even though the underlying THP are PMD mapped on host. The number showed by /sys/kernel/debug/kvm/largepage is much less than the number of PMD mapped shmem pages as the below: 7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 579584 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 12 And some benchmarks do worse than with anonymous THPs. By digging into the code we figured out that commit `127393fbe5` ("mm: thp: kvm: fix memory corruption in KVM with THP enabled") checks if there is a single PTE mapping on the page for anonymous THP when setting up EPT map. But the _mapcount < 0 check doesn't work for page cache THP since every subpage of page cache THP would get _mapcount inc'ed once it is PMD mapped, so PageTransCompoundMap() always returns false for page cache THP. This would prevent KVM from setting up PMD mapped EPT entry. So we need handle page cache THP correctly. However, when page cache THP's PMD gets split, kernel just remove the map instead of setting up PTE map like what anonymous THP does. Before KVM calls get_user_pages() the subpages may get PTE mapped even though it is still a THP since the page cache THP may be mapped by other processes at the mean time. Checking its _mapcount and whether the THP has PTE mapped or not. Although this may report some false negative cases (PTE mapped by other processes), it looks not trivial to make this accurate. With this fix /sys/kernel/debug/kvm/largepage would show reasonable pages are PMD mapped by EPT as the below: 7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 557056 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 271 And the benchmarks are as same as anonymous THPs. [yang.shi@linux.alibaba.com: v4] Link: http://lkml.kernel.org/r/1571865575-42913-1-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1571769577-89735-1-git-send-email-yang.shi@linux.alibaba.com Fixes: `dd78fedde4` ("rmap: support file thp") Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Reported-by: Gang Deng <gavin.dg@linux.alibaba.com> Tested-by: Gang Deng <gavin.dg@linux.alibaba.com> Suggested-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-11-06 08:28:58 -08:00
Jakub Kicinski	683916f6a8	net/tls: fix sk_msg trim on fallback to copy mode sk_msg_trim() tries to only update curr pointer if it falls into the trimmed region. The logic, however, does not take into the account pointer wrapping that sk_msg_iter_var_prev() does nor (as John points out) the fact that msg->sg is a ring buffer. This means that when the message was trimmed completely, the new curr pointer would have the value of MAX_MSG_FRAGS - 1, which is neither smaller than any other value, nor would it actually be correct. Special case the trimming to 0 length a little bit and rework the comparison between curr and end to take into account wrapping. This bug caused the TLS code to not copy all of the message, if zero copy filled in fewer sg entries than memcopy would need. Big thanks to Alexander Potapenko for the non-KMSAN reproducer. v2: - take into account that msg->sg is a ring buffer (John). Link: https://lore.kernel.org/netdev/20191030160542.30295-1-jakub.kicinski@netronome.com/ (v1) Fixes: `d829e9c411` ("tls: convert to generic sk_msg interface") Reported-by: syzbot+f8495bff23a879a6d0bd@syzkaller.appspotmail.com Reported-by: syzbot+6f50c99e8f6194bf363f@syzkaller.appspotmail.com Co-developed-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-05 18:07:47 -08:00
David S. Miller	41de23e223	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2019-11-02 The following pull-request contains BPF updates for your net tree. We've added 6 non-merge commits during the last 6 day(s) which contain a total of 8 files changed, 35 insertions(+), 9 deletions(-). The main changes are: 1) Fix ppc BPF JIT's tail call implementation by performing a second pass to gather a stable JIT context before opcode emission, from Eric Dumazet. 2) Fix build of BPF samples sys_perf_event_open() usage to compiled out unavailable test_attr__{enabled,open} checks. Also fix potential overflows in bpf_map_{area_alloc,charge_init} on 32 bit archs, from Björn Töpel. 3) Fix narrow loads of bpf_sysctl context fields with offset > 0 on big endian archs like s390x and also improve the test coverage, from Ilya Leoshkevich. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-05 17:38:21 -08:00
Matteo Croce	15122464d5	icmp: add helpers to recognize ICMP error packets Add two helper functions, one for IPv4 and one for IPv6, to recognize the ICMP packets which are error responses. This packets are special because they have as payload the original header of the packet which generated it (RFC 792 says at least 8 bytes, but Linux actually includes much more than that). Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-05 14:03:11 -08:00
Andrew Lunn	0c65b2b90d	net: of_get_phy_mode: Change API to solve int/unit warnings Before this change of_get_phy_mode() returned an enum, phy_interface_t. On error, -ENODEV etc, is returned. If the result of the function is stored in a variable of type phy_interface_t, and the compiler has decided to represent this as an unsigned int, comparision with -ENODEV etc, is a signed vs unsigned comparision. Fix this problem by changing the API. Make the function return an error, or 0 on success, and pass a pointer, of type phy_interface_t, where the phy mode should be stored. v2: Return with *interface set to PHY_INTERFACE_MODE_NA on error. Add error checks to all users of of_get_phy_mode() Fixup a few reverse christmas tree errors Fixup a few slightly malformed reverse christmas trees v3: Fix 0-day reported errors. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-04 11:21:25 -08:00
Matthew Wilcox (Oracle)	f6341c5af4	idr: Fix integer overflow in idr_for_each_entry If there is an entry at INT_MAX then idr_for_each_entry() will increment id after handling it. This is undefined behaviour, and is caught by UBSAN. Adding 1U to id forces the operation to be carried out as an unsigned addition which (when assigned to id) will result in INT_MIN. Since there is never an entry stored at INT_MIN, idr_get_next() will return NULL, ending the loop as expected. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2019-11-03 06:36:43 -05:00
David S. Miller	ae8a76fb8b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2019-11-02 The following pull-request contains BPF updates for your net-next tree. We've added 30 non-merge commits during the last 7 day(s) which contain a total of 41 files changed, 1864 insertions(+), 474 deletions(-). The main changes are: 1) Fix long standing user vs kernel access issue by introducing bpf_probe_read_user() and bpf_probe_read_kernel() helpers, from Daniel. 2) Accelerated xskmap lookup, from Björn and Maciej. 3) Support for automatic map pinning in libbpf, from Toke. 4) Cleanup of BTF-enabled raw tracepoints, from Alexei. 5) Various fixes to libbpf and selftests. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-02 15:29:58 -07:00
David S. Miller	d31e95585c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net The only slightly tricky merge conflict was the netdevsim because the mutex locking fix overlapped a lot of driver reload reorganization. The rest were (relatively) trivial in nature. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-02 13:54:56 -07:00
Daniel Borkmann	75a1a607bb	uaccess: Add strict non-pagefault kernel-space read function Add two new probe_kernel_read_strict() and strncpy_from_unsafe_strict() helpers which by default alias to the __probe_kernel_read() and the __strncpy_from_unsafe(), respectively, but can be overridden by archs which have non-overlapping address ranges for kernel space and user space in order to bail out with -EFAULT when attempting to probe user memory including non-canonical user access addresses [0]: 4-level page tables: user-space mem: 0x0000000000000000 - 0x00007fffffffffff non-canonical: 0x0000800000000000 - 0xffff7fffffffffff 5-level page tables: user-space mem: 0x0000000000000000 - 0x00ffffffffffffff non-canonical: 0x0100000000000000 - 0xfeffffffffffffff The idea is that these helpers are complementary to the probe_user_read() and strncpy_from_unsafe_user() which probe user-only memory. Both added helpers here do the same, but for kernel-only addresses. Both set of helpers are going to be used for BPF tracing. They also explicitly avoid throwing the splat for non-canonical user addresses from `00c42373d3` ("x86-64: add warning for non-canonical user access address dereferences"). For compat, the current probe_kernel_read() and strncpy_from_unsafe() are left as-is. [0] Documentation/x86/x86_64/mm.txt Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: x86@kernel.org Link: https://lore.kernel.org/bpf/eefeefd769aa5a013531f491a71f0936779e916b.1572649915.git.daniel@iogearbox.net	2019-11-02 12:39:12 -07:00
Daniel Borkmann	1d1585ca0f	uaccess: Add non-pagefault user-space write function Commit `3d7081822f` ("uaccess: Add non-pagefault user-space read functions") missed to add probe write function, therefore factor out a probe_write_common() helper with most logic of probe_kernel_write() except setting KERNEL_DS, and add a new probe_user_write() helper so it can be used from BPF side. Again, on some archs, the user address space and kernel address space can co-exist and be overlapping, so in such case, setting KERNEL_DS would mean that the given address is treated as being in kernel address space. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/bpf/9df2542e68141bfa3addde631441ee45503856a8.1572649915.git.daniel@iogearbox.net	2019-11-02 12:39:12 -07:00
Matthew Wilcox (Oracle)	797060ec42	radix tree: Remove radix_tree_iter_find This API is unsafe to use under the RCU lock. With no in-tree users remaining, remove it to prevent future bugs. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2019-11-01 22:26:34 -04:00
Linus Torvalds	1204c70d9d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) Fix free/alloc races in batmanadv, from Sven Eckelmann. 2) Several leaks and other fixes in kTLS support of mlx5 driver, from Tariq Toukan. 3) BPF devmap_hash cost calculation can overflow on 32-bit, from Toke Høiland-Jørgensen. 4) Add an r8152 device ID, from Kazutoshi Noguchi. 5) Missing include in ipv6's addrconf.c, from Ben Dooks. 6) Use siphash in flow dissector, from Eric Dumazet. Attackers can easily infer the 32-bit secret otherwise etc. 7) Several netdevice nesting depth fixes from Taehee Yoo. 8) Fix several KCSAN reported errors, from Eric Dumazet. For example, when doing lockless skb_queue_empty() checks, and accessing sk_napi_id/sk_incoming_cpu lockless as well. 9) Fix jumbo packet handling in RXRPC, from David Howells. 10) Bump SOMAXCONN and tcp_max_syn_backlog values, from Eric Dumazet. 11) Fix DMA synchronization in gve driver, from Yangchun Fu. 12) Several bpf offload fixes, from Jakub Kicinski. 13) Fix sk_page_frag() recursion during memory reclaim, from Tejun Heo. 14) Fix ping latency during high traffic rates in hisilicon driver, from Jiangfent Xiao. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits) net: fix installing orphaned programs net: cls_bpf: fix NULL deref on offload filter removal selftests: bpf: Skip write only files in debugfs selftests: net: reuseport_dualstack: fix uninitalized parameter r8169: fix wrong PHY ID issue with RTL8168dp net: dsa: bcm_sf2: Fix IMP setup for port different than 8 net: phylink: Fix phylink_dbg() macro gve: Fixes DMA synchronization. inet: stop leaking jiffies on the wire ixgbe: Remove duplicate clear_bit() call Documentation: networking: device drivers: Remove stray asterisks e1000: fix memory leaks i40e: Fix receive buffer starvation for AF_XDP igb: Fix constant media auto sense switching when no cable is connected net: ethernet: arc: add the missed clk_disable_unprepare igb: Enable media autosense for the i350. igb/igc: Don't warn on fatal read failures when the device is removed tcp: increase tcp_max_syn_backlog max value net: increase SOMAXCONN to 4096 netdevsim: Fix use-after-free during device dismantle ...	2019-11-01 17:48:11 -07:00
Linus Torvalds	372bf6c1c8	NFS Client Bugfixes for Linux 5.4-rc6 Stable bugfixes: - Fix an RCU lock leak in nfs4_refresh_delegation_stateid() Other fixes: - The TCP back channel mustn't disappear while requests are outstanding - The RDMA back channel mustn't disappear while requests are outstanding - Destroy the back channel when we destroy the host transport - Don't allow a cached open with a revoked delegation -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl28mZ4ACgkQ18tUv7Cl QOvAAxAA5h15NJ8OTOpr/kyWQAHEYWIEWQyTNYT8Gi/TI/YCfauET6kxAKg7ETk+ Bx6+A7Q5OlKSqFZXf2UMMIHPnvH7Ar06uIof6DLjWxA9N/q57SNW0YszQCUhvWjp 4Ry7d9A2yNrCizEk/vTmY5zfFIo2S88AvwkQ6gLCEyfIn5REWQyQnS4SS3vL7MuS E/iWBsXFV2C9/yON+zzZnC0ewMIPbtWA3/CsVI2zDKLsHKvBYOx6n7TbfTtMtc/p mCdgLzdEEZlN0KOCzzMp7ziME8y1qUGX6eo17Jrr0ZBZIM7o5d7sYMA88Apu6dMg MQM5iEEAf//kmHLWzQ6yg8XZEZrXFMOiGKYu0kWt4pVlqi78gemEfE6T8dClsB5m P7jqn/4ntDlJPWPciImsVEzDOHlzlZBIXwZs1JjufeO/hIB570cs80PjeDQP0id8 Cx3yR7Hr1ldKzwSwpWrtIaqEPljJJw0jhxQ7YYlvfbA1Ogd4ek4QU4RuvyW11CRr d/5Oaa7xvjGs63klu8HpZG4NHFby3PPmZT6t4xUcH75MR14S15YCGtaUvPJ7ORck C6tZJAkVtJqSoAjsGuBtTc5qiOdo+hMRghutW4Pd6UpnPYJKwQUtTqp3MbfM4+bA S19tGG6qn+7cl7bW4NJpNy8HfbC7BdS+m7maHpQfGIdMsxKVFoI= =ttcU -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client bugfixes from Anna Schumaker: "This contains two delegation fixes (with the RCU lock leak fix marked for stable), and three patches to fix destroying the the sunrpc back channel. Stable bugfixes: - Fix an RCU lock leak in nfs4_refresh_delegation_stateid() Other fixes: - The TCP back channel mustn't disappear while requests are outstanding - The RDMA back channel mustn't disappear while requests are outstanding - Destroy the back channel when we destroy the host transport - Don't allow a cached open with a revoked delegation" * tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid() NFSv4: Don't allow a cached open with a revoked delegation SUNRPC: Destroy the back channel when we destroy the host transport SUNRPC: The RDMA back channel mustn't disappear while requests are outstanding SUNRPC: The TCP back channel mustn't disappear while requests are outstanding	2019-11-01 17:37:44 -07:00
Björn Töpel	d817991cc7	xsk: Restructure/inline XSKMAP lookup/redirect/flush In this commit the XSKMAP entry lookup function used by the XDP redirect code is moved from the xskmap.c file to the xdp_sock.h header, so the lookup can be inlined from, e.g., the bpf_xdp_redirect_map() function. Further the __xsk_map_redirect() and __xsk_map_flush() is moved to the xsk.c, which lets the compiler inline the xsk_rcv() and xsk_flush() functions. Finally, all the XDP socket functions were moved from linux/bpf.h to net/xdp_sock.h, where most of the XDP sockets functions are anyway. This yields a ~2% performance boost for the xdpsock "rx_drop" scenario. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191101110346.15004-4-bjorn.topel@gmail.com	2019-11-02 00:38:49 +01:00
Linus Torvalds	355f83c1d0	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes: an ABI fix for a reserved field, AMD IBS fixes, an Intel uncore PMU driver fix and a header typo fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/headers: Fix spelling s/EACCESS/EACCES/, s/privilidge/privilege/ perf/x86/uncore: Fix event group support perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h) perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity perf/core: Start rejecting the syscall with attr.__reserved_2 set	2019-11-01 11:40:47 -07:00
Linus Torvalds	b2a18c25c7	Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: "Various fixes all over the map: prevent boot crashes on HyperV, classify UEFI randomness as bootloader randomness, fix EFI boot for the Raspberry Pi2, fix efi_test permissions, etc" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN x86, efi: Never relocate kernel below lowest acceptable address efi: libstub/arm: Account for firmware reserved memory at the base of RAM efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness efi/tpm: Return -EINVAL when determining tpm final events log size fails efi: Make CONFIG_EFI_RCI2_TABLE selectable on x86 only	2019-11-01 11:32:50 -07:00
Ioana Ciornei	1ac210d128	bus: fsl-mc: add the fsl_mc_get_endpoint function Using the newly added fsl_mc_get_endpoint function a fsl-mc driver can find its associated endpoint (another object at the other link of a MC firmware link). The API will be used in the following patch in order to discover the connected DPMAC object of a DPNI. Also, the fsl_mc_device_lookup function is made available to the entire fsl-mc bus driver and not just for the dprc driver. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 14:19:45 -07:00
Eric Dumazet	19f92a030c	net: increase SOMAXCONN to 4096 SOMAXCONN is /proc/sys/net/core/somaxconn default value. It has been defined as 128 more than 20 years ago. Since it caps the listen() backlog values, the very small value has caused numerous problems over the years, and many people had to raise it on their hosts after beeing hit by problems. Google has been using 1024 for at least 15 years, and we increased this to 4096 after TCP listener rework has been completed, more than 4 years ago. We got no complain of this change breaking any legacy application. Many applications indeed setup a TCP listener with listen(fd, -1); meaning they let the system select the backlog. Raising SOMAXCONN lowers chance of the port being unavailable under even small SYNFLOOD attack, and reduces possibilities of side channel vulnerabilities. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Yue Cao <ycao009@ucr.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 14:01:40 -07:00
Björn Töpel	ff1c08e1f7	bpf: Change size to u64 for bpf_map_{area_alloc, charge_init}() The functions bpf_map_area_alloc() and bpf_map_charge_init() prior this commit passed the size parameter as size_t. In this commit this is changed to u64. All users of these functions avoid size_t overflows on 32-bit systems, by explicitly using u64 when calculating the allocation size and memory charge cost. However, since the result was narrowed by the size_t when passing size and cost to the functions, the overflow handling was in vain. Instead of changing all call sites to size_t and handle overflow at the call site, the parameter is changed to u64 and checked in the functions above. Fixes: `d407bd25a2` ("bpf: don't trigger OOM killer under pressure with map alloc") Fixes: `c85d69135a` ("bpf: move memory size checks to bpf_map_charge_init()") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Link: https://lore.kernel.org/bpf/20191029154307.23053-1-bjorn.topel@gmail.com	2019-10-31 21:41:33 +01:00
Vikas Gupta	246880958a	firmware: broadcom: add OP-TEE based BNXT f/w manager This driver registers on TEE bus to interact with OP-TEE based BNXT firmware management modules Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Sheetal Tigadoli <sheetal.tigadoli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-31 11:00:45 -07:00
Alexei Starovoitov	f1b9509c2f	bpf: Replace prog_raw_tp+btf_id with prog_tracing The bpf program type raw_tp together with 'expected_attach_type' was the most appropriate api to indicate BTF-enabled raw_tp programs. But during development it became apparent that 'expected_attach_type' cannot be used and new 'attach_btf_id' field had to be introduced. Which means that the information is duplicated in two fields where one of them is ignored. Clean it up by introducing new program type where both 'expected_attach_type' and 'attach_btf_id' fields have specific meaning. In the future 'expected_attach_type' will be extended with other attach points that have similar semantics to raw_tp. This patch is replacing BTF-enabled BPF_PROG_TYPE_RAW_TRACEPOINT with prog_type = BPF_RPOG_TYPE_TRACING expected_attach_type = BPF_TRACE_RAW_TP attach_btf_id = btf_id of raw tracepoint inside the kernel Future patches will add expected_attach_type = BPF_TRACE_FENTRY or BPF_TRACE_FEXIT where programs have the same input context and the same helpers, but different attach points. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191030223212.953010-2-ast@kernel.org	2019-10-31 15:16:59 +01:00
Javier Martinez Canillas	359efcc2c9	efi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN The driver exposes EFI runtime services to user-space through an IOCTL interface, calling the EFI services function pointers directly without using the efivar API. Disallow access to the /dev/efi_test character device when the kernel is locked down to prevent arbitrary user-space to call EFI runtime services. Also require CAP_SYS_ADMIN to open the chardev to prevent unprivileged users to call the EFI runtime services, instead of just relying on the chardev file mode bits for this. The main user of this driver is the fwts [0] tool that already checks if the effective user ID is 0 and fails otherwise. So this change shouldn't cause any regression to this tool. [0]: https://wiki.ubuntu.com/FirmwareTestSuite/Reference/uefivarinfo Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Laszlo Ersek <lersek@redhat.com> Acked-by: Matthew Garrett <mjg59@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191029173755.27149-7-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-31 09:40:21 +01:00
Kairui Song	220dd7699c	x86, efi: Never relocate kernel below lowest acceptable address Currently, kernel fails to boot on some HyperV VMs when using EFI. And it's a potential issue on all x86 platforms. It's caused by broken kernel relocation on EFI systems, when below three conditions are met: 1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR) by the loader. 2. There isn't enough room to contain the kernel, starting from the default load address (eg. something else occupied part the region). 3. In the memmap provided by EFI firmware, there is a memory region starts below LOAD_PHYSICAL_ADDR, and suitable for containing the kernel. EFI stub will perform a kernel relocation when condition 1 is met. But due to condition 2, EFI stub can't relocate kernel to the preferred address, so it fallback to ask EFI firmware to alloc lowest usable memory region, got the low region mentioned in condition 3, and relocated kernel there. It's incorrect to relocate the kernel below LOAD_PHYSICAL_ADDR. This is the lowest acceptable kernel relocation address. The first thing goes wrong is in arch/x86/boot/compressed/head_64.S. Kernel decompression will force use LOAD_PHYSICAL_ADDR as the output address if kernel is located below it. Then the relocation before decompression, which move kernel to the end of the decompression buffer, will overwrite other memory region, as there is no enough memory there. To fix it, just don't let EFI stub relocate the kernel to any address lower than lowest acceptable address. [ ardb: introduce efi_low_alloc_above() to reduce the scope of the change ] Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191029173755.27149-6-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-31 09:40:19 +01:00
Linus Torvalds	e472c64aa4	dmaengine fixes for v5.4-rc6 Few fixes on the dmaengine drivers: - fix in sprd driver for link list and potential memory leak - tegra transfer failure fix - imx size check fix for script_number - xilinx fix for 64bit AXIDMA and control reg update - qcom bam dma resource leak fix - cppi slave transfer fix when idle -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAl26hEgACgkQfBQHDyUj g0cAjRAAweFLgnBti+Cqb3mjbHxYfSknw/4CNis5XvhXfCGUupgqQQfL1mjH1ZGv GPz0lOjgkaUwtioxMrIqGSc/7MCsHOK02kktfY87SSraSQrKF2Hl9Y0KRYu8FIWW AFvmbhXAtLqRY87r6K/S7LzQzSrYjfGHxHV5ShsA18+2Y2gtuQQ2V6nbkjZquOle So27cjAZXYQJP9YI2DJXpcgwe2DSO2cgfcLH58t6Prw4/piJf7P1Qhs/ng5Egfbe RahP8AdTSjRf2ylVHngaM5EFuDsjn/uslEfh93dL4gQH/NWqNgTbk02R0HtwrRjs F8EVNsDqVjWrc1D/tKWUucYVMDdGHmNovN0sGESbuVBgUKwqjrhLMrEzxs2d0cgN h2YRbPnAHRmO/F0x789TGFsw3BU0T/gAl+LrVFUBCKvhwtEtsG6rmX1wjSVca3lM zMhV9oipMUBPSVQQhyIf/yizEqw0YkgFKelQppb6e/I+OBgQGYchgaiQtbNcI4L2 jOkQ7dsFPL8a8bmN+r38IzJMjHp5F+Govuc+1nmhuALpGxgBtmEqzgzBiP3AZgcJ XjkKW3IQdK+TAcqA6eDFLlm33Npe+flXQ/QvJlpYfHLDzZx2CNQExaFVo4meNwDu iGajKgP9DhO6l7uaep2mDxMPz7u/edhfvHy6cjWUIH9KjwPZ64w= =e87E -----END PGP SIGNATURE----- Merge tag 'dmaengine-fix-5.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma Pull dmaengine fixes from Vinod Koul: "A few fixes to the dmaengine drivers: - fix in sprd driver for link list and potential memory leak - tegra transfer failure fix - imx size check fix for script_number - xilinx fix for 64bit AXIDMA and control reg update - qcom bam dma resource leak fix - cppi slave transfer fix when idle" * tag 'dmaengine-fix-5.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma: dmaengine: cppi41: Fix cppi41_dma_prep_slave_sg() when idle dmaengine: qcom: bam_dma: Fix resource leak dmaengine: sprd: Fix the possible memory leak issue dmaengine: xilinx_dma: Fix control reg update in vdma_channel_set_config dmaengine: xilinx_dma: Fix 64-bit simple AXIDMA transfer dmaengine: imx-sdma: fix size check for sdma script_number dmaengine: tegra210-adma: fix transfer failure dmaengine: sprd: Fix the link-list pointer register configuration issue	2019-10-31 07:34:09 +00:00
Trond Myklebust	669996add4	SUNRPC: Destroy the back channel when we destroy the host transport When we're destroying the host transport mechanism, we should ensure that we do not leak memory by failing to release any back channel slots that might still exist. Reported-by: Neil Brown <neilb@suse.de> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2019-10-30 12:04:35 -04:00
Roi Dayan	6dfef396ea	net/mlx5: Fix flow counter list auto bits struct The union should contain the extended dest and counter list. Remove the resevered 0x40 bits which is redundant. This change doesn't break any functionally. Everything works today because the code in fs_cmd.c is using the correct structs if extended dest or the basic dest. Fixes: `1b11549859` ("net/mlx5: Introduce extended destination fields") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-10-29 16:27:17 -07:00
Tejun Heo	20eb4f29b6	net: fix sk_page_frag() recursion from memory reclaim sk_page_frag() optimizes skb_frag allocations by using per-task skb_frag cache when it knows it's the only user. The condition is determined by seeing whether the socket allocation mask allows blocking - if the allocation may block, it obviously owns the task's context and ergo exclusively owns current->task_frag. Unfortunately, this misses recursion through memory reclaim path. Please take a look at the following backtrace. [2] RIP: 0010:tcp_sendmsg_locked+0xccf/0xe10 ... tcp_sendmsg+0x27/0x40 sock_sendmsg+0x30/0x40 sock_xmit.isra.24+0xa1/0x170 [nbd] nbd_send_cmd+0x1d2/0x690 [nbd] nbd_queue_rq+0x1b5/0x3b0 [nbd] __blk_mq_try_issue_directly+0x108/0x1b0 blk_mq_request_issue_directly+0xbd/0xe0 blk_mq_try_issue_list_directly+0x41/0xb0 blk_mq_sched_insert_requests+0xa2/0xe0 blk_mq_flush_plug_list+0x205/0x2a0 blk_flush_plug_list+0xc3/0xf0 [1] blk_finish_plug+0x21/0x2e _xfs_buf_ioapply+0x313/0x460 __xfs_buf_submit+0x67/0x220 xfs_buf_read_map+0x113/0x1a0 xfs_trans_read_buf_map+0xbf/0x330 xfs_btree_read_buf_block.constprop.42+0x95/0xd0 xfs_btree_lookup_get_block+0x95/0x170 xfs_btree_lookup+0xcc/0x470 xfs_bmap_del_extent_real+0x254/0x9a0 __xfs_bunmapi+0x45c/0xab0 xfs_bunmapi+0x15/0x30 xfs_itruncate_extents_flags+0xca/0x250 xfs_free_eofblocks+0x181/0x1e0 xfs_fs_destroy_inode+0xa8/0x1b0 destroy_inode+0x38/0x70 dispose_list+0x35/0x50 prune_icache_sb+0x52/0x70 super_cache_scan+0x120/0x1a0 do_shrink_slab+0x120/0x290 shrink_slab+0x216/0x2b0 shrink_node+0x1b6/0x4a0 do_try_to_free_pages+0xc6/0x370 try_to_free_mem_cgroup_pages+0xe3/0x1e0 try_charge+0x29e/0x790 mem_cgroup_charge_skmem+0x6a/0x100 __sk_mem_raise_allocated+0x18e/0x390 __sk_mem_schedule+0x2a/0x40 [0] tcp_sendmsg_locked+0x8eb/0xe10 tcp_sendmsg+0x27/0x40 sock_sendmsg+0x30/0x40 ___sys_sendmsg+0x26d/0x2b0 __sys_sendmsg+0x57/0xa0 do_syscall_64+0x42/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 In [0], tcp_send_msg_locked() was using current->page_frag when it called sk_wmem_schedule(). It already calculated how many bytes can be fit into current->page_frag. Due to memory pressure, sk_wmem_schedule() called into memory reclaim path which called into xfs and then IO issue path. Because the filesystem in question is backed by nbd, the control goes back into the tcp layer - back into tcp_sendmsg_locked(). nbd sets sk_allocation to (GFP_NOIO \| __GFP_MEMALLOC) which makes sense - it's in the process of freeing memory and wants to be able to, e.g., drop clean pages to make forward progress. However, this confused sk_page_frag() called from [2]. Because it only tests whether the allocation allows blocking which it does, it now thinks current->page_frag can be used again although it already was being used in [0]. After [2] used current->page_frag, the offset would be increased by the used amount. When the control returns to [0], current->page_frag's offset is increased and the previously calculated number of bytes now may overrun the end of allocated memory leading to silent memory corruptions. Fix it by adding gfpflags_normal_context() which tests sleepable && !reclaim and use it to determine whether to use current->task_frag. v2: Eric didn't like gfp flags being tested twice. Introduce a new helper gfpflags_normal_context() and combine the two tests. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-28 16:17:31 -07:00
Eric Dumazet	d7d16a8935	net: add skb_queue_empty_lockless() Some paths call skb_queue_empty() without holding the queue lock. We must use a barrier in order to not let the compiler do strange things, and avoid KCSAN splats. Adding a barrier in skb_queue_empty() might be overkill, I prefer adding a new helper to clearly identify points where the callers might be lockless. This might help us finding real bugs. The corresponding WRITE_ONCE() should add zero cost for current compilers. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-28 13:33:41 -07:00
Linus Torvalds	9e5eefba3d	virtio: fixes Some minor fixes Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJdtqXyAAoJECgfDbjSjVRpJ3kH/008+Y5aDaWcWP4//49epuu/ dldFu073fppcNeYjRLOLEGPLHok3w2uh4zsBRZ+hrK8qiH7yjZDxxzLrbBWKINRE HpZRp5Xdtmdu7QRWUMBiGHTdU/CIY/tUpGRy2d+k8LxeTBcrZIqj/9ixrPRW6sh0 0iXPQJtHtDAbbMjbs86s9ScwfcY/mwxLFf23168DnwE44pk86faFGASstWx+XSKl DU14t3eQnYX33R089lDMP+ohKLsqMXyD160WWZDPg+Ml2GhzyGGWixcDmZn3k42q NBCOsfLL8uxdATZlWMlv4xX1ykiNcOfijBnp8eutsLx4JyIWFxqVF0bXO3GsztM= =xHx4 -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio fixes from Michael Tsirkin: "Some minor fixes" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vringh: fix copy direction of vringh_iov_push_kern() vsock/virtio: remove unused 'work' field from 'struct virtio_vsock_pkt' virtio_ring: fix stalls for packed rings	2019-10-28 12:47:22 +01:00
Geert Uytterhoeven	652521d460	perf/headers: Fix spelling s/EACCESS/EACCES/, s/privilidge/privilege/ As per POSIX, the correct spelling of the error code is EACCES: include/uapi/asm-generic/errno-base.h:#define EACCES 13 /* Permission denied */ Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Kosina <trivial@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/20191024122904.12463-1-geert+renesas@glider.be Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-10-28 11:02:01 +01:00
Stefano Garzarella	6771596169	vsock/virtio: remove unused 'work' field from 'struct virtio_vsock_pkt' The 'work' field was introduced with commit `06a8fc7836` ("VSOCK: Introduce virtio_vsock_common.ko") but it is never used in the code, so we can remove it to save memory allocated in the per-packet 'struct virtio_vsock_pkt' Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-10-28 04:25:04 -04:00
David S. Miller	5b7fe93db0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2019-10-27 The following pull-request contains BPF updates for your net-next tree. We've added 52 non-merge commits during the last 11 day(s) which contain a total of 65 files changed, 2604 insertions(+), 1100 deletions(-). The main changes are: 1) Revolutionize BPF tracing by using in-kernel BTF to type check BPF assembly code. The work here teaches BPF verifier to recognize kfree_skb()'s first argument as 'struct sk_buff *' in tracepoints such that verifier allows direct use of bpf_skb_event_output() helper used in tc BPF et al (w/o probing memory access) that dumps skb data into perf ring buffer. Also add direct loads to probe memory in order to speed up/replace bpf_probe_read() calls, from Alexei Starovoitov. 2) Big batch of changes to improve libbpf and BPF kselftests. Besides others: generalization of libbpf's CO-RE relocation support to now also include field existence relocations, revamp the BPF kselftest Makefile to add test runner concept allowing to exercise various ways to build BPF programs, and teach bpf_object__open() and friends to automatically derive BPF program type/expected attach type from section names to ease their use, from Andrii Nakryiko. 3) Fix deadlock in stackmap's build-id lookup on rq_lock(), from Song Liu. 4) Allow to read BTF as raw data from bpftool. Most notable use case is to dump /sys/kernel/btf/vmlinux through this, from Jiri Olsa. 5) Use bpf_redirect_map() helper in libbpf's AF_XDP helper prog which manages to improve "rx_drop" performance by ~4%., from Björn Töpel. 6) Fix to restore the flow dissector after reattach BPF test and also fix error handling in bpf_helper_defs.h generation, from Jakub Sitnicki. 7) Improve verifier's BTF ctx access for use outside of raw_tp, from Martin KaFai Lau. 8) Improve documentation for AF_XDP with new sections and to reflect latest features, from Magnus Karlsson. 9) Add back 'version' section parsing to libbpf for old kernels, from John Fastabend. 10) Fix strncat bounds error in libbpf's libbpf_prog_type_by_name(), from KP Singh. 11) Turn on -mattr=+alu32 in LLVM by default for BPF kselftests in order to improve insn coverage for built BPF progs, from Yonghong Song. 12) Misc minor cleanups and fixes, from various others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-26 22:57:27 -07:00
David S. Miller	1a51a47491	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2019-10-27 The following pull-request contains BPF updates for your net tree. We've added 7 non-merge commits during the last 11 day(s) which contain a total of 7 files changed, 66 insertions(+), 16 deletions(-). The main changes are: 1) Fix two use-after-free bugs in relation to RCU in jited symbol exposure to kallsyms, from Daniel Borkmann. 2) Fix NULL pointer dereference in AF_XDP rx-only sockets, from Magnus Karlsson. 3) Fix hang in netdev unregister for hash based devmap as well as another overflow bug on 32 bit archs in memlock cost calculation, from Toke Høiland-Jørgensen. 4) Fix wrong memory access in LWT BPF programs on reroute due to invalid dst. Also fix BPF selftests to use more compatible nc options, from Jiri Benc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-26 18:30:55 -07:00
Linus Torvalds	13fa692e3f	Driver core fix for 5.4-rc5 Here is a single sysfs fix for 5.4-rc5. It resolves an error if you actually try to use the __BIN_ATTR_WO() macro, seems I never tested it properly before :( This has been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXbSJpg8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ynoggCgt6YYqlZ1N51QSE7Lf57QN83IOOoAnjih4LWX b1q3d4JxuKEfGXdwgl3V =CrXh -----END PGP SIGNATURE----- Merge tag 'driver-core-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fix from Greg KH: "Here is a single sysfs fix for 5.4-rc5. It resolves an error if you actually try to use the __BIN_ATTR_WO() macro, seems I never tested it properly before :( This has been in linux-next for a while with no reported issues" * tag 'driver-core-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: sysfs: Fixes __BIN_ATTR_WO() macro	2019-10-26 15:23:08 -04:00
David S. Miller	4b1f5ddaff	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next, more specifically: * Updates for ipset: 1) Coding style fix for ipset comment extension, from Jeremy Sowden. 2) De-inline many functions in ipset, from Jeremy Sowden. 3) Move ipset function definition from header to source file. 4) Move ip_set_put_flags() to source, export it as a symbol, remove inline. 5) Move range_to_mask() to the source file where this is used. 6) Move ip_set_get_ip_port() to the source file where this is used. * IPVS selftests and netns improvements: 7) Two patches to speedup ipvs netns dismantle, from Haishuang Yan. 8) Three patches to add selftest script for ipvs, also from Haishuang Yan. * Conntrack updates and new nf_hook_slow_list() function: 9) Document ct ecache extension, from Florian Westphal. 10) Skip ct extensions from ctnetlink dump, from Florian. 11) Free ct extension immediately, from Florian. 12) Skip access to ecache extension from nf_ct_deliver_cached_events() this is not correct as reported by Syzbot. 13) Add and use nf_hook_slow_list(), from Florian. * Flowtable infrastructure updates: 14) Move priority to nf_flowtable definition. 15) Dynamic allocation of per-device hooks in flowtables. 16) Allow to include netdevice only once in flowtable definitions. 17) Rise maximum number of devices per flowtable. * Netfilter hardware offload infrastructure updates: 18) Add nft_flow_block_chain() helper function. 19) Pass callback list to nft_setup_cb_call(). 20) Add nft_flow_cls_offload_setup() helper function. 21) Remove rules for the unregistered device via netdevice event. 22) Support for multiple devices in a basechain definition at the ingress hook. 22) Add nft_chain_offload_cmd() helper function. 23) Add nft_flow_block_offload_init() helper function. 24) Rewind in case of failing to bind multiple devices to hook. 25) Typo in IPv6 tproxy module description, from Norman Rasmussen. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-26 11:35:43 -07:00

1 2 3 4 5 ...

68807 Commits