linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-23 06:43:06 +07:00

Author	SHA1	Message	Date
Willem de Bruijn	8f19f12bdc	selftests: in udpgso_bench do not test udp zerocopy The udpgso benchmark compares various configurations of UDP and TCP. Including one that is not upstream, udp zerocopy. This is a leftover from the earlier RFC patchset. The test is part of kselftests and run in continuous spinners. Remove the failing case to make the test start passing. Fixes: `3a687bef14` ("selftests: udp gso benchmark") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:56:50 -07:00
Willem de Bruijn	993675a310	packet: reset network header if packet shorter than ll reserved space If variable length link layer headers result in a packet shorter than dev->hard_header_len, reset the network header offset. Else skb->mac_len may exceed skb->len after skb_mac_reset_len. packet_sendmsg_spkt already has similar logic. Fixes: `b84bbaf7a6` ("packet: in packet_snd start writing at link layer allocation") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:55:59 -07:00
Willem de Bruijn	bab2c80e5a	nsh: set mac len based on inner packet When pulling the NSH header in nsh_gso_segment, set the mac length based on the encapsulated packet type. skb_reset_mac_len computes an offset to the network header, which here still points to the outer packet: > skb_reset_network_header(skb); > [...] > __skb_pull(skb, nsh_len); > skb_reset_mac_header(skb); // now mac hdr starts nsh_len == 8B after net hdr > skb_reset_mac_len(skb); // mac len = net hdr - mac hdr == (u16) -8 == 65528 > [..] > skb_mac_gso_segment(skb, ..) Link: http://lkml.kernel.org/r/CAF=yD-KeAcTSOn4AxirAxL8m7QAS8GBBe1w09eziYwvPbbUeYA@mail.gmail.com Reported-by: syzbot+7b9ed9872dab8c32305d@syzkaller.appspotmail.com Fixes: `c411ed8545` ("nsh: add GSO support") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:55:29 -07:00
Stefano Brivio	8b7008620b	net: Don't copy pfmemalloc flag in __copy_skb_header() The pfmemalloc flag indicates that the skb was allocated from the PFMEMALLOC reserves, and the flag is currently copied on skb copy and clone. However, an skb copied from an skb flagged with pfmemalloc wasn't necessarily allocated from PFMEMALLOC reserves, and on the other hand an skb allocated that way might be copied from an skb that wasn't. So we should not copy the flag on skb copy, and rather decide whether to allow an skb to be associated with sockets unrelated to page reclaim depending only on how it was allocated. Move the pfmemalloc flag before headers_start[0] using an existing 1-bit hole, so that __copy_skb_header() doesn't copy it. When cloning, we'll now take care of this flag explicitly, contravening to the warning comment of __skb_clone(). While at it, restore the newline usage introduced by commit `b193722731` ("net: reorganize sk_buff for faster __copy_skb_header()") to visually separate bytes used in bitfields after headers_start[0], that was gone after commit `a9e419dc7b` ("netfilter: merge ctinfo into nfct pointer storage area"), and describe the pfmemalloc flag in the kernel-doc structure comment. This doesn't change the size of sk_buff or cacheline boundaries, but consolidates the 15 bits hole before tc_index into a 2 bytes hole before csum, that could now be filled more easily. Reported-by: Patrick Talbert <ptalbert@redhat.com> Fixes: `c93bdd0e03` ("netvm: allow skb allocation to use PFMEMALLOC reserves") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 15:15:16 -07:00
David S. Miller	1ff9c66b08	Merge branch 'sfc-filter-locking-fixes' Bert Kenward says: ==================== sfc: filter locking fixes Two fixes for sfc ef10 filter table locking. Initially spotted by lockdep, but one issue has also been seen in normal use. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 14:52:04 -07:00
Bert Kenward	193f20033c	sfc: hold filter_sem consistently during reset We should take and release the filter_sem consistently during the reset process, in the same manner as the mac_lock and reset_lock. For lockdep consistency we also take the filter_sem for write around other calls to efx->type->init(). Fixes: `c2bebe37c6` ("sfc: give ef10 its own rwsem in the filter table instead of filter_lock") Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 14:52:04 -07:00
Bert Kenward	1c56c0994a	sfc: avoid hang from nested use of the filter_sem In some situations we may end up calling down_read while already holding the semaphore for write, thus hanging. This has been seen when setting the MAC address for the interface. The hung task log in this situation includes this stack: down_read efx_ef10_filter_insert efx_ef10_filter_insert_addr_list efx_ef10_filter_vlan_sync_rx_mode efx_ef10_filter_add_vlan efx_ef10_filter_table_probe efx_ef10_set_mac_address efx_set_mac_address dev_set_mac_address In addition, lockdep rightly points out that nested calling of down_read is incorrect. Fixes: `c2bebe37c6` ("sfc: give ef10 its own rwsem in the filter table instead of filter_lock") Tested-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 14:52:04 -07:00
Florian Fainelli	9e3bff9239	net: systemport: Fix CRC forwarding check for SYSTEMPORT Lite SYSTEMPORT Lite reversed the logic compared to SYSTEMPORT, the GIB_FCS_STRIP bit is set when the Ethernet FCS is stripped, and that bit is not set by default. Fix the logic such that we properly check whether that bit is set or not and we don't forward an extra 4 bytes to the network stack. Fixes: `44a4524c54` ("net: systemport: Add support for SYSTEMPORT Lite") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 14:47:47 -07:00
Stefan Baranoff	70b7ff1302	tcp: allow user to create repair socket without window probes Under rare conditions where repair code may be used it is possible that window probes are either unnecessary or undesired. If the user knows that window probes are not wanted or needed this change allows them to skip sending them when a socket comes out of repair. Signed-off-by: Stefan Baranoff <sbaranoff@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 14:33:45 -07:00
Stefan Baranoff	21684dc46c	tcp: fix sequence numbers for repaired sockets re-using TIME-WAIT sockets This patch fixes a bug where the sequence numbers of a socket created using TCP repair functionality are lower than set after connect is called. This occurs when the repair socket overlaps with a TIME-WAIT socket and triggers the re-use code. The amount lower is equal to the number of times that a particular IP/port set is re-used and then put back into TIME-WAIT. Re-using the first time the sequence number is 1 lower, closing that socket and then re-opening (with repair) a new socket with the same addresses/ports puts the sequence number 2 lower than set via setsockopt. The third time is 3 lower, etc. I have not tested what the limit of this acrewal is, if any. The fix is, if a socket is in repair mode, to respect the already set sequence number and timestamp when it would have already re-used the TIME-WAIT socket. Signed-off-by: Stefan Baranoff <sbaranoff@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 14:33:45 -07:00
Jacob Keller	83fe6b8709	sch_fq_codel: zero q->flows_cnt when fq_codel_init fails When fq_codel_init fails, qdisc_create_dflt will cleanup by using qdisc_destroy. This function calls the ->reset() op prior to calling the ->destroy() op. Unfortunately, during the failure flow for sch_fq_codel, the ->flows parameter is not initialized, so the fq_codel_reset function will null pointer dereference. kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 kernel: IP: fq_codel_reset+0x58/0xd0 [sch_fq_codel] kernel: PGD 0 P4D 0 kernel: Oops: 0000 [#1] SMP PTI kernel: Modules linked in: i40iw i40e(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc devlink ebtable_filter ebtables ip6table_filter ip6_tables rpcrdma ib_isert iscsi_target_mod sunrpc ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate iTCO_wdt iTCO_vendor_support intel_uncore ib_core intel_rapl_perf mei_me mei joydev i2c_i801 lpc_ich ioatdma shpchp wmi sch_fq_codel xfs libcrc32c mgag200 ixgbe drm_kms_helper isci ttm firewire_ohci kernel: mdio drm igb libsas crc32c_intel firewire_core ptp pps_core scsi_transport_sas crc_itu_t dca i2c_algo_bit ipmi_si ipmi_devintf ipmi_msghandler [last unloaded: i40e] kernel: CPU: 10 PID: 4219 Comm: ip Tainted: G OE 4.16.13custom-fq-codel-test+ #3 kernel: Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.05.0004.051120151007 05/11/2015 kernel: RIP: 0010:fq_codel_reset+0x58/0xd0 [sch_fq_codel] kernel: RSP: 0018:ffffbfbf4c1fb620 EFLAGS: 00010246 kernel: RAX: 0000000000000400 RBX: 0000000000000000 RCX: 00000000000005b9 kernel: RDX: 0000000000000000 RSI: ffff9d03264a60c0 RDI: ffff9cfd17b31c00 kernel: RBP: 0000000000000001 R08: 00000000000260c0 R09: ffffffffb679c3e9 kernel: R10: fffff1dab06a0e80 R11: ffff9cfd163af800 R12: ffff9cfd17b31c00 kernel: R13: 0000000000000001 R14: ffff9cfd153de600 R15: 0000000000000001 kernel: FS: 00007fdec2f92800(0000) GS:ffff9d0326480000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000008 CR3: 0000000c1956a006 CR4: 00000000000606e0 kernel: Call Trace: kernel: qdisc_destroy+0x56/0x140 kernel: qdisc_create_dflt+0x8b/0xb0 kernel: mq_init+0xc1/0xf0 kernel: qdisc_create_dflt+0x5a/0xb0 kernel: dev_activate+0x205/0x230 kernel: __dev_open+0xf5/0x160 kernel: __dev_change_flags+0x1a3/0x210 kernel: dev_change_flags+0x21/0x60 kernel: do_setlink+0x660/0xdf0 kernel: ? down_trylock+0x25/0x30 kernel: ? xfs_buf_trylock+0x1a/0xd0 [xfs] kernel: ? rtnl_newlink+0x816/0x990 kernel: ? _xfs_buf_find+0x327/0x580 [xfs] kernel: ? _cond_resched+0x15/0x30 kernel: ? kmem_cache_alloc+0x20/0x1b0 kernel: ? rtnetlink_rcv_msg+0x200/0x2f0 kernel: ? rtnl_calcit.isra.30+0x100/0x100 kernel: ? netlink_rcv_skb+0x4c/0x120 kernel: ? netlink_unicast+0x19e/0x260 kernel: ? netlink_sendmsg+0x1ff/0x3c0 kernel: ? sock_sendmsg+0x36/0x40 kernel: ? ___sys_sendmsg+0x295/0x2f0 kernel: ? ebitmap_cmp+0x6d/0x90 kernel: ? dev_get_by_name_rcu+0x73/0x90 kernel: ? skb_dequeue+0x52/0x60 kernel: ? __inode_wait_for_writeback+0x7f/0xf0 kernel: ? bit_waitqueue+0x30/0x30 kernel: ? fsnotify_grab_connector+0x3c/0x60 kernel: ? __sys_sendmsg+0x51/0x90 kernel: ? do_syscall_64+0x74/0x180 kernel: ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2 kernel: Code: 00 00 48 89 87 00 02 00 00 8b 87 a0 01 00 00 85 c0 0f 84 84 00 00 00 31 ed 48 63 dd 83 c5 01 48 c1 e3 06 49 03 9c 24 90 01 00 00 <48> 8b 73 08 48 8b 3b e8 6c 9a 4f f6 48 8d 43 10 48 c7 03 00 00 kernel: RIP: fq_codel_reset+0x58/0xd0 [sch_fq_codel] RSP: ffffbfbf4c1fb620 kernel: CR2: 0000000000000008 kernel: ---[ end trace e81a62bede66274e ]--- This is caused because flows_cnt is non-zero, but flows hasn't been initialized. fq_codel_init has left the private data in a partially initialized state. To fix this, reset flows_cnt to 0 when we fail to initialize. Additionally, to make the state more consistent, also cleanup the flows pointer when the allocation of backlogs fails. This fixes the NULL pointer dereference, since both the for-loop and memset in fq_codel_reset will be no-ops when flow_cnt is zero. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 12:32:09 -07:00
David S. Miller	352884868a	Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2018-07-12 This series contains updates to ixgbe and e100/e1000 kernel documentation. Alex fixes ixgbe to ensure that we are more explicit about the ordering of updates to the receive address register (RAR) table. Dan Carpenter fixes an issue where we were reading one element beyond the end of the array. Mauro Carvalho Chehab fixes formatting issues in the e100.rst and e1000.rst that were causing errors during 'make htmldocs'. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 10:27:11 -07:00
Mauro Carvalho Chehab	8dc4b1a799	networking: e1000.rst: Get rid of Sphinx warnings Documentation/networking/e1000.rst:83: ERROR: Unexpected indentation. Documentation/networking/e1000.rst:84: WARNING: Block quote ends without a blank line; unexpected unindent. Documentation/networking/e1000.rst:173: WARNING: Definition list ends without a blank line; unexpected unindent. Documentation/networking/e1000.rst:236: WARNING: Definition list ends without a blank line; unexpected unindent. While here, fix highlights and mark a table as such. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-07-12 08:08:20 -07:00
Mauro Carvalho Chehab	b203cc7a4f	networking: e100.rst: Get rid of Sphinx warnings Documentation/networking/e100.rst:57: WARNING: Literal block expected; none found. Documentation/networking/e100.rst:68: WARNING: Literal block expected; none found. Documentation/networking/e100.rst:75: WARNING: Literal block expected; none found. Documentation/networking/e100.rst:84: WARNING: Literal block expected; none found. Documentation/networking/e100.rst:93: WARNING: Inline emphasis start-string without end-string. While here, fix some highlights. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-07-12 08:06:11 -07:00
Dan Carpenter	c411104115	ixgbe: Off by one in ixgbe_ipsec_tx() The ipsec->tx_tbl[] has IXGBE_IPSEC_MAX_SA_COUNT elements so the > needs to be changed to >= so we don't read one element beyond the end of the array. Fixes: `5925947047` ("ixgbe: process the Tx ipsec offload") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-07-12 08:03:09 -07:00
Alexander Duyck	d14c780c11	ixgbe: Be more careful when modifying MAC filters This change makes it so that we are much more explicit about the ordering of updates to the receive address register (RAR) table. Prior to this patch I believe we may have been updating the table while entries were still active, or possibly allowing for reordering of things since we weren't explicitly flushing writes to either the lower or upper portion of the register prior to accessing the other half. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-07-12 07:17:13 -07:00
David S. Miller	672f5ccece	Merge branch 'ieee802154-for-davem-2018-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan Stefan Schmidt says: ==================== pull-request: ieee802154 for net 2018-07-11 An update from ieee802154 for your net tree. Build system fix for a missing include from Arnd Bergmann. Setting the IFLA_LINK for the lowpan parent from Lubomir Rintel. Fixes for some RX corner cases in adf7242 driver by Michael Hennerich. And some small patches to cleanup our BUG_ON vs WARN_ON usage. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 00:19:06 -07:00
Ewan D. Milne	20c4515a1a	qed: fix spelling mistake "successffuly" -> "successfully" Trivial fix to spelling mistake in qed_probe message. Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 00:02:05 -07:00
Russell King	576cd32082	sfp: fix module initialisation with netdev already up It was been observed that with a particular order of initialisation, the netdev can be up, but the SFP module still has its TX_DISABLE signal asserted. This occurs when the network device brought up before the SFP kernel module has been inserted by userspace. This occurs because sfp-bus layer does not hear about the change in network device state, and so assumes that it is still down. Set netdev->sfp when the upstream is registered to work around this problem. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 23:11:34 -07:00
Russell King	f20a4c46b9	sfp: ensure we clean up properly on bus registration failure We fail to correctly clean up after a bus registration failure, which can lead to an incorrect assumption about the registration state of the upstream or sfp cage. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 23:11:34 -07:00
Taehee Yoo	0026129c86	rhashtable: add restart routine in rhashtable_free_and_destroy() rhashtable_free_and_destroy() cancels re-hash deferred work then walks and destroys elements. at this moment, some elements can be still in future_tbl. that elements are not destroyed. test case: nft_rhash_destroy() calls rhashtable_free_and_destroy() to destroy all elements of sets before destroying sets and chains. But rhashtable_free_and_destroy() doesn't destroy elements of future_tbl. so that splat occurred. test script: %cat test.nft table ip aa { map map1 { type ipv4_addr : verdict; elements = { 0 : jump a0, 1 : jump a0, 2 : jump a0, 3 : jump a0, 4 : jump a0, 5 : jump a0, 6 : jump a0, 7 : jump a0, 8 : jump a0, 9 : jump a0, } } chain a0 { } } flush ruleset table ip aa { map map1 { type ipv4_addr : verdict; elements = { 0 : jump a0, 1 : jump a0, 2 : jump a0, 3 : jump a0, 4 : jump a0, 5 : jump a0, 6 : jump a0, 7 : jump a0, 8 : jump a0, 9 : jump a0, } } chain a0 { } } flush ruleset %while :; do nft -f test.nft; done Splat looks like: [ 200.795603] kernel BUG at net/netfilter/nf_tables_api.c:1363! [ 200.806944] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 200.812253] CPU: 1 PID: 1582 Comm: nft Not tainted 4.17.0+ #24 [ 200.820297] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015 [ 200.830309] RIP: 0010:nf_tables_chain_destroy.isra.34+0x62/0x240 [nf_tables] [ 200.838317] Code: 43 50 85 c0 74 26 48 8b 45 00 48 8b 4d 08 ba 54 05 00 00 48 c7 c6 60 6d 29 c0 48 c7 c7 c0 65 29 c0 4c 8b 40 08 e8 58 e5 fd f8 <0f> 0b 48 89 da 48 b8 00 00 00 00 00 fc ff [ 200.860366] RSP: 0000:ffff880118dbf4d0 EFLAGS: 00010282 [ 200.866354] RAX: 0000000000000061 RBX: ffff88010cdeaf08 RCX: 0000000000000000 [ 200.874355] RDX: 0000000000000061 RSI: 0000000000000008 RDI: ffffed00231b7e90 [ 200.882361] RBP: ffff880118dbf4e8 R08: ffffed002373bcfb R09: ffffed002373bcfa [ 200.890354] R10: 0000000000000000 R11: ffffed002373bcfb R12: dead000000000200 [ 200.898356] R13: dead000000000100 R14: ffffffffbb62af38 R15: dffffc0000000000 [ 200.906354] FS: 00007fefc31fd700(0000) GS:ffff88011b800000(0000) knlGS:0000000000000000 [ 200.915533] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 200.922355] CR2: 0000557f1c8e9128 CR3: 0000000106880000 CR4: 00000000001006e0 [ 200.930353] Call Trace: [ 200.932351] ? nf_tables_commit+0x26f6/0x2c60 [nf_tables] [ 200.939525] ? nf_tables_setelem_notify.constprop.49+0x1a0/0x1a0 [nf_tables] [ 200.947525] ? nf_tables_delchain+0x6e0/0x6e0 [nf_tables] [ 200.952383] ? nft_add_set_elem+0x1700/0x1700 [nf_tables] [ 200.959532] ? nla_parse+0xab/0x230 [ 200.963529] ? nfnetlink_rcv_batch+0xd06/0x10d0 [nfnetlink] [ 200.968384] ? nfnetlink_net_init+0x130/0x130 [nfnetlink] [ 200.975525] ? debug_show_all_locks+0x290/0x290 [ 200.980363] ? debug_show_all_locks+0x290/0x290 [ 200.986356] ? sched_clock_cpu+0x132/0x170 [ 200.990352] ? find_held_lock+0x39/0x1b0 [ 200.994355] ? sched_clock_local+0x10d/0x130 [ 200.999531] ? memset+0x1f/0x40 V2: - free all tables requested by Herbert Xu Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-09 16:28:51 -07:00
David S. Miller	252dd17642	Merge branch 'bnxt_en-Bug-fixes' Michael Chan says: ==================== bnxt_en: Bug fixes. These are bug fixes in error code paths, TC Flower VLAN TCI flow checking bug fix, proper filtering of Broadcast packets if IFF_BROADCAST is not set, and a bug fix in bnxt_get_max_rings() to return 0 ring parameters when the return value is -ENOMEM. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-09 16:27:15 -07:00
Vikas Gupta	c58387ab16	bnxt_en: Fix for system hang if request_irq fails Fix bug in the error code path when bnxt_request_irq() returns failure. bnxt_disable_napi() should not be called in this error path because NAPI has not been enabled yet. Fixes: `c0c050c58d` ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-09 16:27:14 -07:00
Michael Chan	30f529473e	bnxt_en: Do not modify max IRQ count after RDMA driver requests/frees IRQs. Calling bnxt_set_max_func_irqs() to modify the max IRQ count requested or freed by the RDMA driver is flawed. The max IRQ count is checked when re-initializing the IRQ vectors and this can happen multiple times during ifup or ethtool -L. If the max IRQ is reduced and the RDMA driver is operational, we may not initailize IRQs correctly. This problem shows up on VFs with very small number of MSIX. There is no other logic that relies on the IRQ count excluding the ones used by RDMA. So we fix it by just removing the call to subtract or add the IRQs used by RDMA. Fixes: `a588e4580a` ("bnxt_en: Add interface to support RDMA driver.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-09 16:27:14 -07:00
Michael Chan	30e338487a	bnxt_en: Support clearing of the IFF_BROADCAST flag. Currently, the driver assumes IFF_BROADCAST is always set and always sets the broadcast filter. Modify the code to set or clear the broadcast filter according to the IFF_BROADCAST flag. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-09 16:27:14 -07:00
Michael Chan	78f058a4aa	bnxt_en: Always set output parameters in bnxt_get_max_rings(). The current code returns -ENOMEM and does not bother to set the output parameters to 0 when no rings are available. Some callers, such as bnxt_get_channels() will display garbage ring numbers when that happens. Fix it by always setting the output parameters. Fixes: `6e6c5a57fb` ("bnxt_en: Modify bnxt_get_max_rings() to support shared or non shared rings.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-09 16:27:14 -07:00
Michael Chan	07f4fde53d	bnxt_en: Fix inconsistent BNXT_FLAG_AGG_RINGS logic. If there aren't enough RX rings available, the driver will attempt to use a single RX ring without the aggregation ring. If that also fails, the BNXT_FLAG_AGG_RINGS flag is cleared but the other ring parameters are not set consistently to reflect that. If more RX rings become available at the next open, the RX rings will be in an inconsistent state and may crash when freeing the RX rings. Fix it by restoring the BNXT_FLAG_AGG_RINGS if not enough RX rings are available to run without aggregation rings. Fixes: `bdbd1eb59c` ("bnxt_en: Handle no aggregation ring gracefully.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-09 16:27:14 -07:00
Venkat Duvvuru	e32d4e60b3	bnxt_en: Fix the vlan_tci exact match check. It is possible that OVS may set don’t care for DEI/CFI bit in vlan_tci mask. Hence, checking for vlan_tci exact match will endup in a vlan flow rejection. This patch fixes the problem by checking for vlan_pcp and vid separately, instead of checking for the entire vlan_tci. Fixes: `e85a9be93c` (bnxt_en: do not allow wildcard matches for L2 flows) Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-09 16:27:14 -07:00
David S. Miller	26420d9ce0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree: 1) Missing module autoloadfor icmp and icmpv6 x_tables matches, from Florian Westphal. 2) Possible non-linear access to TCP header from tproxy, from Mate Eckl. 3) Do not allow rbtree to be used for single elements, this patch moves all set backend into one single module since such thing can only happen if hashtable module is explicitly blacklisted, which should not ever be done. 4) Reject error and standard targets from nft_compat for sanity reasons, they are never used from there. 5) Don't crash on double hashsize module parameter, from Andrey Ryabinin. 6) Drop dst on skb before placing it in the fragmentation reassembly queue, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-09 14:23:13 -07:00
Florian Westphal	84379c9afe	netfilter: ipv6: nf_defrag: drop skb dst before queueing Eric Dumazet reports: Here is a reproducer of an annoying bug detected by syzkaller on our production kernel [..] ./b78305423 enable_conntrack Then : sleep 60 dmesg \| tail -10 [ 171.599093] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 181.631024] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 191.687076] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 201.703037] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 211.711072] unregister_netdevice: waiting for lo to become free. Usage count = 2 [ 221.959070] unregister_netdevice: waiting for lo to become free. Usage count = 2 Reproducer sends ipv6 fragment that hits nfct defrag via LOCAL_OUT hook. skb gets queued until frag timer expiry -- 1 minute. Normally nf_conntrack_reasm gets called during prerouting, so skb has no dst yet which might explain why this wasn't spotted earlier. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2018-07-09 18:04:12 +02:00
Andrey Ryabinin	2045cdfa1b	netfilter: nf_conntrack: Fix possible possible crash on module loading. Loading the nf_conntrack module with doubled hashsize parameter, i.e. modprobe nf_conntrack hashsize=12345 hashsize=12345 causes NULL-ptr deref. If 'hashsize' specified twice, the nf_conntrack_set_hashsize() function will be called also twice. The first nf_conntrack_set_hashsize() call will set the 'nf_conntrack_htable_size' variable: nf_conntrack_set_hashsize() ... /* On boot, we can set this without any fancy locking. */ if (!nf_conntrack_htable_size) return param_set_uint(val, kp); But on the second invocation, the nf_conntrack_htable_size is already set, so the nf_conntrack_set_hashsize() will take a different path and call the nf_conntrack_hash_resize() function. Which will crash on the attempt to dereference 'nf_conntrack_hash' pointer: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: 0010:nf_conntrack_hash_resize+0x255/0x490 [nf_conntrack] Call Trace: nf_conntrack_set_hashsize+0xcd/0x100 [nf_conntrack] parse_args+0x1f9/0x5a0 load_module+0x1281/0x1a50 __se_sys_finit_module+0xbe/0xf0 do_syscall_64+0x7c/0x390 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fix this, by checking !nf_conntrack_hash instead of !nf_conntrack_htable_size. nf_conntrack_hash will be initialized only after the module loaded, so the second invocation of the nf_conntrack_set_hashsize() won't crash, it will just reinitialize nf_conntrack_htable_size again. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2018-07-09 18:04:11 +02:00
Florian Westphal	21d5e07819	netfilter: nft_compat: explicitly reject ERROR and standard target iptables-nft never requests these, but make this explicitly illegal. If it were quested, kernel could oops as ->eval is NULL, furthermore, the builtin targets have no owning module so its possible to rmmod eb/ip/ip6_tables module even if they would be loaded. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2018-07-09 16:06:19 +02:00
Michael Hennerich	58e9683d14	net: ieee802154: adf7242: Fix OCL calibration runs Reissuing RC_RX every 400ms - to adjust for offset drift in receiver see datasheet page 61, OCL section. Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>	2018-07-09 10:53:12 +02:00
Michael Hennerich	36d26d6b62	net: ieee802154: adf7242: Fix erroneous RX enable Only enable RX mode if the netdev is opened. Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>	2018-07-09 10:53:11 +02:00
Stefan Schmidt	8f2fbc6c60	ieee802154: fakelb: switch from BUG_ON() to WARN_ON() on problem The check is valid but it does not warrant to crash the kernel. A WARN_ON() is good enough here. Found by checkpatch. Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>	2018-07-09 10:05:47 +02:00
Stefan Schmidt	8a81388ec2	ieee802154: at86rf230: use __func__ macro for debug messages Instead of having the function name hard-coded (it might change and we forgot to update them in the debug output) we can use __func__ instead and also shorter the line so we do not need to break it. Also fix an extra blank line while being here. Found by checkpatch. Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>	2018-07-09 10:05:46 +02:00
Stefan Schmidt	20f330452a	ieee802154: at86rf230: switch from BUG_ON() to WARN_ON() on problem The check is valid but it does not warrant to crash the kernel. A WARN_ON() is good enough here. Found by checkpatch. Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>	2018-07-09 10:05:46 +02:00
Eric Dumazet	6508b6781b	tcp: cleanup copied_seq and urg_data in tcp_disconnect tcp_zerocopy_receive() relies on tcp_inq() to limit number of bytes requested by user. syzbot found that after tcp_disconnect(), tcp_inq() was returning a stale value (number of bytes in queue before the disconnect). Note that after this patch, ioctl(fd, SIOCINQ, &val) is also fixed and returns 0, so this might be a candidate for all known linux kernels. While we are at this, we probably also should clear urg_data to avoid other syzkaller reports after it discovers how to deal with urgent data. syzkaller repro : socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3 bind(3, {sa_family=AF_INET, sin_port=htons(20000), sin_addr=inet_addr("224.0.0.1")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(20000), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 send(3, ..., 4096, 0) = 4096 connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 128) = 0 getsockopt(3, SOL_TCP, TCP_ZEROCOPY_RECEIVE, ..., [16]) = 0 // CRASH Fixes: `05255b823a` ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-08 16:56:33 +09:00
David S. Miller	7f93d12951	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Alexei Starovoitov says: ==================== pull-request: bpf 2018-07-07 The following pull-request contains BPF updates for your net tree. Plenty of fixes for different components: 1) A set of critical fixes for sockmap and sockhash, from John Fastabend. 2) fixes for several race conditions in af_xdp, from Magnus Karlsson. 3) hash map refcnt fix, from Mauricio Vasquez. 4) samples/bpf fixes, from Taeung Song. 5) ifup+mtu check for xdp_redirect, from Toshiaki Makita. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-08 13:06:55 +09:00
Paolo Abeni	f6f2a4a2eb	ipfrag: really prevent allocation on netns exit Setting the low threshold to 0 has no effect on frags allocation, we need to clear high_thresh instead. The code was pre-existent to commit `648700f76b` ("inet: frags: use rhashtables for reassembly units"), but before the above, such assignment had a different role: prevent concurrent eviction from the worker and the netns cleanup helper. Fixes: `648700f76b` ("inet: frags: use rhashtables for reassembly units") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-08 13:05:33 +09:00
Lorenzo Colitti	acc2cf4e37	net: diag: Don't double-free TCP_NEW_SYN_RECV sockets in tcp_abort When tcp_diag_destroy closes a TCP_NEW_SYN_RECV socket, it first frees it by calling inet_csk_reqsk_queue_drop_and_and_put in tcp_abort, and then frees it again by calling sock_gen_put. Since tcp_abort only has one caller, and all the other codepaths in tcp_abort don't free the socket, just remove the free in that function. Cc: David Ahern <dsa@cumulusnetworks.com> Tested: passes Android sock_diag_test.py, which exercises this codepath Fixes: `d7226c7a4d` ("net: diag: Fix refcnt leak in error path destroying socket") Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-08 10:56:10 +09:00
David Ahern	e7372197e1	net/ipv4: Set oif in fib_compute_spec_dst Xin reported that icmp replies may not use the address on the device the echo request is received if the destination address is broadcast. Instead a route lookup is done without considering VRF context. Fix by setting oif in flow struct to the master device if it is enslaved. That directs the lookup to the VRF table. If the device is not enslaved, oif is still 0 so no affect. Fixes: `cd2fbe1b6b` ("net: Use VRF device index for lookups on RX") Reported-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-08 10:54:58 +09:00
Toshiaki Makita	d8d7218ad8	xdp: XDP_REDIRECT should check IFF_UP and MTU Otherwise we end up with attempting to send packets from down devices or to send oversized packets, which may cause unexpected driver/device behaviour. Generic XDP has already done this check, so reuse the logic in native XDP. Fixes: `814abfabef` ("xdp: add bpf_redirect helper function") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-07-07 15:25:35 -07:00
Alexei Starovoitov	4fb126cbcb	Merge branch 'sockhash-fixes' John Fastabend says: ==================== First three patches resolve issues found while testing sockhash and reviewing code. Syzbot also found them about the same time as I was working on fixes. The main issue is in the sockhash path we reduced the scope of sk_callback lock but this meant we could get update and close running in parallel so fix that here. Then testing sk_msg and sk_skb programs together found that skb->dev is not always assigned and some of the helpers were depending on this to lookup max mtu. Fix this by using SKB_MAX_ALLOC when no MTU is available. Finally, Martin spotted that the sockmap code was still using the qdisc skb cb structure. But I was sure we had fixed this long ago. Looks like we missed it in a merge conflict resolution and then by chance data_end offset was the same in both structures so everything sort of continued to work even though it could break at any moment if the structs ever change. So redo the conversion and this time also convert the helpers. v2: fix '0 files changed' issue in patches ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-07-07 15:19:31 -07:00
John Fastabend	0ea488ff8d	bpf: sockmap, convert bpf_compute_data_pointers to bpf__sk_skb In commit 'bpf: bpf_compute_data uses incorrect cb structure' (`8108a77515`) we added the routine bpf_compute_data_end_sk_skb() to compute the correct data_end values, but this has since been lost. In kernel v4.14 this was correct and the above patch was applied in it entirety. Then when v4.14 was merged into v4.15-rc1 net-next tree we lost the piece that renamed bpf_compute_data_pointers to the new function bpf_compute_data_end_sk_skb. This was done here, `e1ea2f9856` ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") When it conflicted with the following rename patch, `6aaae2b6c4` ("bpf: rename bpf_compute_data_end into bpf_compute_data_pointers") Finally, after a refactor I thought even the function bpf_compute_data_end_sk_skb() was no longer needed and it was erroneously removed. However, we never reverted the sk_skb_convert_ctx_access() usage of tcp_skb_cb which had been committed and survived the merge conflict. Here we fix this by adding back the helper and _data_end_sk_skb() usage. Using the bpf_skc_data_end mapping is not correct because it expects a qdisc_skb_cb object but at the sock layer this is not the case. Even though it happens to work here because we don't overwrite any data in-use at the socket layer and the cb structure is cleared later this has potential to create some subtle issues. But, even more concretely the filter.c access check uses tcp_skb_cb. And by some act of chance though, struct bpf_skb_data_end { struct qdisc_skb_cb qdisc_cb; /* 0 28 / / XXX 4 bytes hole, try to pack / void data_meta; /* 32 8 / void data_end; /* 40 8 / / size: 48, cachelines: 1, members: 3 / / sum members: 44, holes: 1, sum holes: 4 / / last cacheline: 48 bytes / }; and then tcp_skb_cb, struct tcp_skb_cb { [...] struct { __u32 flags; / 24 4 / struct sock sk_redir; /* 32 8 / void data_end; /* 40 8 / } bpf; / 24 */ }; So when we use offset_of() to track down the byte offset we get 40 in either case and everything continues to work. Fix this mess and use correct structures its unclear how long this might actually work for until someone moves the structs around. Reported-by: Martin KaFai Lau <kafai@fb.com> Fixes: `e1ea2f9856` ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") Fixes: `6aaae2b6c4` ("bpf: rename bpf_compute_data_end into bpf_compute_data_pointers") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-07-07 15:19:30 -07:00
John Fastabend	7ebc14d507	bpf: sockmap, consume_skb in close path Currently, when a sock is closed and the bpf_tcp_close() callback is used we remove memory but do not free the skb. Call consume_skb() if the skb is attached to the buffer. Reported-by: syzbot+d464d2c20c717ef5a6a8@syzkaller.appspotmail.com Fixes: `1aa12bdf1b` ("bpf: sockmap, add sock close() hook to remove socks") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-07-07 15:19:30 -07:00
John Fastabend	99ba2b5aba	bpf: sockhash, disallow bpf_tcp_close and update in parallel After latest lock updates there is no longer anything preventing a close and recvmsg call running in parallel. Additionally, we can race update with close if we close a socket and simultaneously update if via the BPF userspace API (note the cgroup ops are already run with sock_lock held). To resolve this take sock_lock in close and update paths. Reported-by: syzbot+b680e42077a0d7c9a0c4@syzkaller.appspotmail.com Fixes: `e9db4ef6bf` ("bpf: sockhash fix omitted bucket lock in sock_close") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-07-07 15:19:30 -07:00
John Fastabend	0c6bc6e531	bpf: fix sk_skb programs without skb->dev assigned Multiple BPF helpers in use by sk_skb programs calculate the max skb length using the __bpf_skb_max_len function. However, this calculates the max length using the skb->dev pointer which can be NULL when an sk_skb program is paired with an sk_msg program. To force this a sk_msg program needs to redirect into the ingress path of a sock with an attach sk_skb program. Then the the sk_skb program would need to call one of the helpers that adjust the skb size. To fix the null ptr dereference use SKB_MAX_ALLOC size if no dev is available. Fixes: `8934ce2fd0` ("bpf: sockmap redirect ingress support") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-07-07 15:19:29 -07:00
Alexei Starovoitov	631da8534a	Merge branch 'sockmap-fixes' John Fastabend says: ==================== I missed fixing the error path in the sockhash code to align with supporting socks in multiple maps. Simply checking if the psock is present does not mean we can decrement the reference count because it could be part of another map. Fix this by cleaning up the error path so this situation does not happen. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-07-07 15:16:59 -07:00
John Fastabend	1d1ef005db	bpf: sockmap, hash table is RCU so readers do not need locks This removes locking from readers of RCU hash table. Its not necessary. Fixes: `8111038444` ("bpf: sockmap, add hash map support") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-07-07 15:16:58 -07:00

1 2 3 4 5 ...

767402 Commits