linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
David S. Miller	bc0fbc66ad	Merge branch 'ipconfig-NTP-server-support-bug-fixes-documentation-improvements' Chris Novakovic says: ==================== ipconfig: NTP server support, bug fixes, documentation improvements This series (against net-next) makes various improvements to ipconfig: - Patch #1 correctly documents the behaviour of parameter 4 in the "ip=" and "nfsaddrs=" command line parameter. - Patch #2 tidies up the printk()s for reporting configured name servers. - Patch #3 fixes a bug in autoconfiguration via BOOTP whereby the IP addresses of IEN-116 name servers are requested from the BOOTP server, rather than those of DNS name servers. - Patch #4 requests the number of DNS servers specified by CONF_NAMESERVERS_MAX when autoconfiguring via BOOTP, rather than hardcoding it to 2. - Patch #5 fully documents the contents and format of /proc/net/pnp in Documentation/filesystems/nfs/nfsroot.txt. - Patch #6 fixes a bug whereby bogus information is written to /proc/net/pnp when ipconfig is not used. - Patch #7 creates a new procfs directory for ipconfig-related configuration reports at /proc/net/ipconfig. - Patch #8 allows for NTP servers to be configured (manually on the kernel command line or automatically via DHCP), enabling systems with an NFS root filesystem to synchronise their clock before mounting their root filesystem. NTP server IP addresses are written to /proc/net/ipconfig/ntp_servers. Changes from v1: - David requested that a new directory /proc/net/ipconfig be created to contain ipconfig-related configuration reports, which is implemented in the new patch #7. NTP server IPs are now written to this directory instead of /proc/net/ntp in the new patch #8. - Cong and David both requested that the modification to CREDITS be dropped. This patch has been removed from the series. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 13:40:42 -04:00
Chris Novakovic	c04d2cb200	ipconfig: Write NTP server IPs to /proc/net/ipconfig/ntp_servers Distributed filesystems are most effective when the server and client clocks are synchronised. Embedded devices often use NFS for their root filesystem but typically do not contain an RTC, so the clocks of the NFS server and the embedded device will be out-of-sync when the root filesystem is mounted (and may not be synchronised until late in the boot process). Extend ipconfig with the ability to export IP addresses of NTP servers it discovers to /proc/net/ipconfig/ntp_servers. They can be supplied as follows: - If ipconfig is configured manually via the "ip=" or "nfsaddrs=" kernel command line parameters, one NTP server can be specified in the new "<ntp0-ip>" parameter. - If ipconfig is autoconfigured via DHCP, request DHCP option 42 in the DHCPDISCOVER message, and record the IP addresses of up to three NTP servers sent by the responding DHCP server in the subsequent DHCPOFFER message. ipconfig will only write the NTP server IP addresses it discovers to /proc/net/ipconfig/ntp_servers, one per line (in the order received from the DHCP server, if DHCP autoconfiguration is used); making use of these NTP servers is the responsibility of a user space process (e.g. an initrd/initram script that invokes an NTP client before mounting an NFS root filesystem). Signed-off-by: Chris Novakovic <chris@chrisn.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 13:40:42 -04:00
Chris Novakovic	4d019b3f80	ipconfig: Create /proc/net/ipconfig directory To allow ipconfig to report IP configuration details to user space processes without cluttering /proc/net, create a new subdirectory /proc/net/ipconfig. All files containing IP configuration details should be written to this directory. Signed-off-by: Chris Novakovic <chris@chrisn.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 13:40:42 -04:00
Chris Novakovic	300eec7c0a	ipconfig: Correctly initialise ic_nameservers ic_nameservers, which stores the list of name servers discovered by ipconfig, is initialised (i.e. has all of its elements set to NONE, or 0xffffffff) by ic_nameservers_predef() in the following scenarios: - before the "ip=" and "nfsaddrs=" kernel command line parameters are parsed (in ip_auto_config_setup()); - before autoconfiguring via DHCP or BOOTP (in ic_bootp_init()), in order to clear any values that may have been set after parsing "ip=" or "nfsaddrs=" and are no longer needed. This means that ic_nameservers_predef() is not called when neither "ip=" nor "nfsaddrs=" is specified on the kernel command line. In this scenario, every element in ic_nameservers remains set to 0x00000000, which is indistinguishable from ANY and causes pnp_seq_show() to write the following (bogus) information to /proc/net/pnp: #MANUAL nameserver 0.0.0.0 nameserver 0.0.0.0 nameserver 0.0.0.0 This is potentially problematic for systems that blindly link /etc/resolv.conf to /proc/net/pnp. Ensure that ic_nameservers is also initialised when neither "ip=" nor "nfsaddrs=" are specified by calling ic_nameservers_predef() in ip_auto_config(), but only when ip_auto_config_setup() was not called earlier. This causes the following to be written to /proc/net/pnp, and is consistent with what gets written when ipconfig is configured manually but no name servers are specified on the kernel command line: #MANUAL Signed-off-by: Chris Novakovic <chris@chrisn.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 13:40:41 -04:00
Chris Novakovic	8b0b37c564	ipconfig: Document /proc/net/pnp Fully document the format used by the /proc/net/pnp file written by ipconfig, explain where its values originate from, and clarify that the tertiary name server IP and DNS domain name are only written to the file when autoconfiguration is used. Signed-off-by: Chris Novakovic <chris@chrisn.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 13:40:41 -04:00
Chris Novakovic	de1fa15b66	ipconfig: BOOTP: Request CONF_NAMESERVERS_MAX name servers When ipconfig is autoconfigured via BOOTP, the request packet initialised by ic_bootp_init_ext() always allocates 8 bytes for the name server option, limiting the BOOTP server to responding with at most 2 name servers even though ipconfig in fact supports an arbitrary number of name servers (as defined by CONF_NAMESERVERS_MAX, which is currently 3). Only request name servers in the request packet if CONF_NAMESERVERS_MAX is positive (to comply with [1, §3.8]), and allocate enough space in the packet for CONF_NAMESERVERS_MAX name servers to indicate the maximum number we can accept in response. [1] RFC 2132, "DHCP Options and BOOTP Vendor Extensions": https://tools.ietf.org/rfc/rfc2132.txt Signed-off-by: Chris Novakovic <chris@chrisn.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 13:40:41 -04:00
Chris Novakovic	4e1a8af28d	ipconfig: BOOTP: Don't request IEN-116 name servers When ipconfig is autoconfigured via BOOTP, the request packet initialised by ic_bootp_init_ext() allocates 8 bytes for tag 5 ("Name Server" [1, §3.7]), but tag 5 in the response isn't processed by ic_do_bootp_ext(). Instead, allocate the 8 bytes to tag 6 ("Domain Name Server" [1, §3.8]), which is processed by ic_do_bootp_ext(), and appears to have been the intended tag to request. This won't cause any breakage for existing users, as tag 5 responses provided by BOOTP servers weren't being processed anyway. [1] RFC 2132, "DHCP Options and BOOTP Vendor Extensions": https://tools.ietf.org/rfc/rfc2132.txt Signed-off-by: Chris Novakovic <chris@chrisn.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 13:40:41 -04:00
Chris Novakovic	e18bdc83ae	ipconfig: Tidy up reporting of name servers Commit `5e953778a2` ("ipconfig: add nameserver IPs to kernel-parameter ip=") adds the IP addresses of discovered name servers to the summary printed by ipconfig when configuration is complete. It appears the intention in ip_auto_config() was to print the name servers on a new line (especially given the spacing and lack of comma before "nameserver0="), but they're actually printed on the same line as the NFS root filesystem configuration summary: [ 0.686186] IP-Config: Complete: [ 0.686226] device=eth0, hwaddr=xx:xx:xx:xx:xx:xx, ipaddr=10.0.0.2, mask=255.255.255.0, gw=10.0.0.1 [ 0.686328] host=test, domain=example.com, nis-domain=(none) [ 0.686386] bootserver=10.0.0.1, rootserver=10.0.0.1, rootpath= nameserver0=10.0.0.1 This makes it harder to read and parse ipconfig's output. Instead, print the name servers on a separate line: [ 0.791250] IP-Config: Complete: [ 0.791289] device=eth0, hwaddr=xx:xx:xx:xx:xx:xx, ipaddr=10.0.0.2, mask=255.255.255.0, gw=10.0.0.1 [ 0.791407] host=test, domain=example.com, nis-domain=(none) [ 0.791475] bootserver=10.0.0.1, rootserver=10.0.0.1, rootpath= [ 0.791476] nameserver0=10.0.0.1 Signed-off-by: Chris Novakovic <chris@chrisn.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 13:40:41 -04:00
Chris Novakovic	660de409e2	ipconfig: Document setting of NIS domain name ic_do_bootp_ext() is responsible for parsing the "ip=" and "nfsaddrs=" kernel parameters. If a "." character is found in parameter 4 (the client's hostname), everything before the first "." is used as the hostname, and everything after it is used as the NIS domain name (but not necessarily the DNS domain name). Document this behaviour in Documentation/filesystems/nfs/nfsroot.txt, as it is not made explicit. Signed-off-by: Chris Novakovic <chris@chrisn.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 13:40:41 -04:00
Florian Fainelli	d805c52093	net: ethtool: Add missing kernel doc for FEC parameters While adding support for ethtool::get_fecparam and set_fecparam, kernel doc for these functions was missed, add those. Fixes: `1a5f3da20b` ("net: ethtool: add support for forward error correction modes") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 13:38:42 -04:00
David S. Miller	5cb5ce3363	Merge branch 'rhash-cleanups' NeilBrown says: ==================== A few rhashtables cleanups 2 patches fixes documentation 1 fixes a bit in rhashtable_walk_start() 1 improves rhashtable_walk stability. All reviewed and Acked. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 13:21:46 -04:00
NeilBrown	5d240a8936	rhashtable: improve rhashtable_walk stability when stop/start used. When a walk of an rhashtable is interrupted with rhastable_walk_stop() and then rhashtable_walk_start(), the location to restart from is based on a 'skip' count in the current hash chain, and this can be incorrect if insertions or deletions have happened. This does not happen when the walk is not stopped and started as iter->p is a placeholder which is safe to use while holding the RCU read lock. In rhashtable_walk_start() we can revalidate that 'p' is still in the same hash chain. If it isn't then the current method is still used. With this patch, if a rhashtable walker ensures that the current object remains in the table over a stop/start period (possibly by elevating the reference count if that is sufficient), it can be sure that a walk will not miss objects that were in the hashtable for the whole time of the walk. rhashtable_walk_start() may not find the object even though it is still in the hashtable if a rehash has moved it to a new table. In this case it will (eventually) get -EAGAIN and will need to proceed through the whole table again to be sure to see everything at least once. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 13:21:46 -04:00
NeilBrown	b41cc04b66	rhashtable: reset iter when rhashtable_walk_start sees new table The documentation claims that when rhashtable_walk_start_check() detects a resize event, it will rewind back to the beginning of the table. This is not true. We need to set ->slot and ->skip to be zero for it to be true. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 13:21:45 -04:00
NeilBrown	82266e98dd	rhashtable: Revise incorrect comment on r{hl, hash}table_walk_enter() Neither rhashtable_walk_enter() or rhltable_walk_enter() sleep, though they do take a spinlock without irq protection. So revise the comments to accurately state the contexts in which these functions can be called. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 13:21:45 -04:00
NeilBrown	0c6f69a5e3	rhashtable: remove outdated comments about grow_decision etc grow_decision and shink_decision no longer exist, so remove the remaining references to them. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 13:21:45 -04:00
Eric Dumazet	8c2320e84c	tcp: md5: only call tp->af_specific->md5_lookup() for md5 sockets RETPOLINE made calls to tp->af_specific->md5_lookup() quite expensive, given they have no result. We can omit the calls for sockets that have no md5 keys. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 13:20:03 -04:00
Willem de Bruijn	a6361f0ca4	packet: fix bitfield update race Updates to the bitfields in struct packet_sock are not atomic. Serialize these read-modify-write cycles. Move po->running into a separate variable. Its writes are protected by po->bind_lock (except for one startup case at packet_create). Also replace a textual precondition warning with lockdep annotation. All others are set only in packet_setsockopt. Serialize these updates by holding the socket lock. Analogous to other field updates, also hold the lock when testing whether a ring is active (pg_vec). Fixes: `8dc4194474` ("[PACKET]: Add optional checksum computation for recvmsg") Reported-by: DaeRyong Jeong <threeearcat@gmail.com> Reported-by: Byoungyoung Lee <byoungyoung@purdue.edu> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 13:17:08 -04:00
Ben Shelton	30d84397af	ice: Do not check INTEVENT bit for OICR interrupts According to the hardware spec, checking the INTEVENT bit isn't a reliable way to detect if an OICR interrupt has occurred. This is because this bit can be cleared by the hardware/firmware before the interrupt service routine has run. So instead, just check for OICR events every time. Fixes: `940b61af02` ("ice: Initialize PF and setup miscellaneous interrupt") Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-04-24 09:03:23 -07:00
Anirudh Venkataramanan	34357a90d5	ice: Fix incorrect comment for action type Action type 5 defines large action generic values. Fix comment to reflect that better. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-04-24 08:56:56 -07:00
Anirudh Venkataramanan	d332a38c95	ice: Fix initialization for num_nodes_added ice_sched_add_nodes_to_layer is used recursively, and so we start with num_nodes_added being 0. This way, in case of an error or if num_nodes is NULL, the function just returns 0 to indicate that no nodes were added. Fixes: `5513b920a4` ("ice: Update Tx scheduler tree for VSI multi-Tx queue support") Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-04-24 08:55:42 -07:00
Vinicius Costa Gomes	2707df9773	igb: Fix the transmission mode of queue 0 for Qav mode When Qav mode is enabled, queue 0 should be kept on Stream Reservation mode. From the i210 datasheet, section 8.12.19: "Note: Queue0 QueueMode must be set to 1b when TransmitMode is set to Qav." ("QueueMode 1b" represents the Stream Reservation mode) The solution is to give queue 0 the all the credits it might need, so it has priority over queue 1. A situation where this can happen is when cbs is "installed" only on queue 1, leaving queue 0 alone. For example: $ tc qdisc replace dev enp2s0 handle 100: parent root mqprio num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0 $ tc qdisc replace dev enp2s0 parent 100:2 cbs locredit -1470 \ hicredit 30 sendslope -980000 idleslope 20000 offload 1 Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-04-24 08:53:19 -07:00
Colin Ian King	39035bfdc3	ixgbevf: ensure xdp_ring resources are free'd on error exit The current error handling for failed resource setup for xdp_ring data is a break out of the loop and returning 0 indicated everything was OK, when in fact it is not. Fix this by exiting via the error exit label err_setup_tx that will clean up the resources correctly and return and error status. Detected by CoverityScan, CID#1466879 ("Logically dead code") Fixes: `21092e9ce8` ("ixgbevf: Add support for XDP_TX action") Signed-off-by: Colin Ian King <colin.king@canonical.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2018-04-24 08:20:40 -07:00
Yafang Shao	a06ac0d67d	Revert "net: init sk_cookie for inet socket" This reverts commit <c6849a3ac17e> ("net: init sk_cookie for inet socket") Per discussion with Eric, when update sock_net(sk)->cookie_gen, the whole cache cache line will be invalidated, as this cache line is shared with all cpus, that may cause great performace hit. Bellow is the data form Eric. "Performance is reduced from ~5 Mpps to ~3.8 Mpps with 16 RX queues on my host" when running synflood test. Have to revert it to prevent from cache line false sharing. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 11:15:32 -04:00
David S. Miller	8399743a5a	Merge branch 'net-DIM-tx' Tal Gilboa says: ==================== Introduce adaptive TX interrupt moderation to net DIM Net DIM is a library designed for dynamic interrupt moderation. It was implemented and optimized with receive side interrupts in mind, since these are usually the CPU expensive ones. This patch-set introduces adaptive transmit interrupt moderation to net DIM, complete with a usage in the mlx5e driver. Using adaptive TX behavior would reduce interrupt rate for multiple scenarios. Furthermore, it is essential for increasing bandwidth on cases where payload aggregation is required. v3: Remove "inline" from functions in .c files (requested by DaveM). Revert adding "enabled" field from struct net_dim and applied mlx5e structural suggestions (suggested by SaeedM). v2: Rebase over proper tree. v1: Fix compilation issues due to missed function renaming. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 10:15:08 -04:00
Tal Gilboa	cbce4f4447	net/mlx5e: Enable adaptive-TX moderation Add support for adaptive TX moderation. This greatly reduces TX interrupt rate and increases bandwidth, mostly for TCP bandwidth over ARM architecture (below). There is a slight single stream TCP with very large message sizes degradation (x86). In this case if there's any moderation on transmitted packets the bandwidth would reduce due to hitting TCP output limit. Since this is a synthetic case, this is still worth doing. Performance improvement (ConnectX-4Lx 40GbE, ARM) TCP 64B bandwidth with 1-50 streams increased 6-35%. TCP 64B bandwidth with 100-500 streams increased 20-70%. Performance improvement (ConnectX-5 100GbE, x86) Bandwidth: increased up to 40% (1024B with 10s of streams). Interrupt rate: reduced up to 50% (1024B with 1000s of streams). Performance degradation (ConnectX-5 100GbE, x86) Bandwidth: up to 10% decrease single stream TCP (1MB message size from 51Gb/s to 47Gb/s). Signed-off-by: Tal Gilboa <talgi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 10:15:08 -04:00
Tal Gilboa	623ad75522	net/dim: Support adaptive TX moderation Interrupt moderation for TX traffic requires different profiles than RX interrupt moderation. The main goal here is to reduce interrupt rate and allow better payload aggregation by keeping SKBs in the TX queue a bit longer. Ping-pong behavior would get a profile with a short timer, so latency wouldn't increase for these scenarios. There might be a slight degradation in bandwidth for single stream with large message sizes, since net.ipv4.tcp_limit_output_bytes is limiting the allowed TX traffic, but with many streams performance is always improved. Signed-off-by: Tal Gilboa <talgi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 10:15:07 -04:00
Tal Gilboa	026a807c2d	net/dim: Rename _get_profile() functions to _get_rx_moderation() Preparation for introducing adaptive TX to net DIM. Signed-off-by: Tal Gilboa <talgi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 10:15:07 -04:00
Paolo Abeni	db688c24ea	vhost_net: use packet weight for rx handler, too Similar to commit `a2ac99905f` ("vhost-net: set packet weight of tx polling to 2 * vq size"), we need a packet-based limit for handler_rx, too - elsewhere, under rx flood with small packets, tx can be delayed for a very long time, even without busypolling. The pkt limit applied to handle_rx must be the same applied by handle_tx, or we will get unfair scheduling between rx and tx. Tying such limit to the queue length makes it less effective for large queue length values and can introduce large process scheduler latencies, so a constant valued is used - likewise the existing bytes limit. The selected limit has been validated with PVP[1] performance test with different queue sizes: queue size 256 512 1024 baseline 366 354 362 weight 128 715 723 670 weight 256 740 745 733 weight 512 600 460 583 weight 1024 423 427 418 A packet weight of 256 gives peek performances in under all the tested scenarios. No measurable regression in unidirectional performance tests has been detected. [1] https://developers.redhat.com/blog/2017/06/05/measuring-and-comparing-open-vswitch-performance/ Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 10:02:13 -04:00
Xin Long	9cf2f437ca	team: fix netconsole setup over team The same fix in Commit `dbe173079a` ("bridge: fix netconsole setup over bridge") is also needed for team driver. While at it, remove the unnecessary parameter *team from team_port_enable_netpoll(). v1->v2: - fix it in a better way, as does bridge. Fixes: `0fb52a27a0` ("team: cleanup netpoll clode") Reported-by: João Avelino Bellomo Filho <jbellomo@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-24 09:36:21 -04:00
Roopa Prabhu	9c20b9372f	net: fib_rules: fix l3mdev netlink attr processing Fixes: `b16fb418b1` ("net: fib_rules: add extack support") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-23 23:20:12 -04:00
David S. Miller	6cd968f448	Merge branch 'amd-xgbe-fixes' aTom Lendacky says: ==================== amd-xgbe: AMD XGBE driver fixes 2018-04-23 This patch series addresses some issues in the AMD XGBE driver. The following fixes are included in this driver update series: - Improve KR auto-negotiation and training (2 patches) - Add pre and post auto-negotiation hooks - Use the pre and post auto-negotiation hooks to disable CDR tracking during auto-negotiation page exchange in KR mode - Check for SFP tranceiver signal support and only use the signal if the SFP indicates that it is supported This patch series is based on net. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-23 21:24:23 -04:00
Tom Lendacky	117df655f8	amd-xgbe: Only use the SFP supported transceiver signals The SFP eeprom indicates the transceiver signals (Rx LOS, Tx Fault, etc.) that it supports. Update the driver to include checking the eeprom data when deciding whether to use a transceiver signal. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-23 21:24:22 -04:00
Tom Lendacky	96f4d430c5	amd-xgbe: Improve KR auto-negotiation and training Update xgbe-phy-v2.c to make use of the auto-negotiation (AN) phy hooks to improve the ability to successfully complete Clause 73 AN when running at 10gbps. Hardware can sometimes have issues with CDR lock when the AN DME page exchange is being performed. The AN and KR training hooks are used as follows: - The pre AN hook is used to disable CDR tracking in the PHY so that the DME page exchange can be successfully and consistently completed. - The post KR training hook is used to re-enable the CDR tracking so that KR training can successfully complete. - The post AN hook is used to check for an unsuccessful AN which will increase a CDR tracking enablement delay (up to a maximum value). Add two debugfs entries to allow control over use of the CDR tracking workaround. The debugfs entries allow the CDR tracking workaround to be disabled and determine whether to re-enable CDR tracking before or after link training has been initiated. Also, with these changes the receiver reset cycle that is performed during the link status check can be performed less often. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-23 21:24:22 -04:00
Tom Lendacky	4d945663a6	amd-xgbe: Add pre/post auto-negotiation phy hooks Add hooks to the driver auto-negotiation (AN) flow to allow the different phy implementations to perform any steps necessary to improve AN. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-23 21:24:22 -04:00
Guillaume Nault	a49e2f5d5f	pppoe: check sockaddr length in pppoe_connect() We must validate sockaddr_len, otherwise userspace can pass fewer data than we expect and we end up accessing invalid data. Fixes: `224cf5ad14` ("ppp: Move the PPP drivers") Reported-by: syzbot+4f03bdf92fdf9ef5ddab@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-23 21:12:15 -04:00
Guillaume Nault	eb1c28c058	l2tp: check sockaddr length in pppol2tp_connect() Check sockaddr_len before dereferencing sp->sa_protocol, to ensure that it actually points to valid data. Fixes: `fd558d186d` ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Reported-by: syzbot+a70ac890b23b1bf29f5c@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-23 21:10:43 -04:00
Anders Roxell	b300fcf883	selftests: net: update .gitignore with missing test Fixes: `192dc405f3` ("selftests: net: add tcp_mmap program") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-23 21:07:22 -04:00
Jingju Hou	b6a930fa88	net: phy: marvell: clear wol event before setting it If WOL event happened once, the LED[2] interrupt pin will not be cleared unless we read the CSISR register. If interrupts are in use, the normal interrupt handling will clear the WOL event. Let's clear the WOL event before enabling it if !phy_interrupt_is_valid(). Signed-off-by: Jingju Hou <Jingju.Hou@synaptics.com> Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-23 21:06:41 -04:00
Colin Ian King	064223c123	dca: make function dca_common_get_tag static Function dca_common_get_tag is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: drivers/dca/dca-core.c:273:4: warning: symbol 'dca_common_get_tag' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-23 21:02:41 -04:00
Daniel Borkmann	b3f8adee85	Merge branch 'bpf-sockmap-fixes' John Fastabend says: ==================== While testing sockmap with more programs (besides our test programs) I found a couple issues. The attached series fixes an issue where pinned maps were not working correctly, blocking sockets returned zero, and an error path that when the sock hit an out of memory case resulted in a double page_put() while doing ingress redirects. See individual patches for more details. v2: Incorporated Daniel's feedback to use map ops for uref put op which also fixed the build error discovered in v1. v3: rename map_put_uref to map_release_uref ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-04-24 00:49:47 +02:00
John Fastabend	4fcfdfb833	bpf: sockmap, fix double page_put on ENOMEM error in redirect path In the case where the socket memory boundary is hit the redirect path returns an ENOMEM error. However, before checking for this condition the redirect scatterlist buffer is setup with a valid page and length. This is never unwound so when the buffers are released latter in the error path we do a put_page() and clear the scatterlist fields. But, because the initial error happens before completing the scatterlist buffer we end up with both the original buffer and the redirect buffer pointing to the same page resulting in duplicate put_page() calls. To fix this simply move the initial configuration of the redirect scatterlist buffer below the sock memory check. Found this while running TCP_STREAM test with netperf using Cilium. Fixes: `fa246693a1` ("bpf: sockmap, BPF_F_INGRESS flag for BPF_SK_SKB_STREAM_VERDICT") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-04-24 00:49:45 +02:00
John Fastabend	e20f733483	bpf: sockmap, sk_wait_event needed to handle blocking cases In the recvmsg handler we need to add a wait event to support the blocking use cases. Without this we return zero and may confuse user applications. In the wait event any data received on the sk either via sk_receive_queue or the psock ingress list will wake up the sock. Fixes: `fa246693a1` ("bpf: sockmap, BPF_F_INGRESS flag for BPF_SK_SKB_STREAM_VERDICT") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-04-24 00:49:45 +02:00
John Fastabend	ba6b8de423	bpf: sockmap, map_release does not hold refcnt for pinned maps Relying on map_release hook to decrement the reference counts when a map is removed only works if the map is not being pinned. In the pinned case the ref is decremented immediately and the BPF programs released. After this BPF programs may not be in-use which is not what the user would expect. This patch moves the release logic into bpf_map_put_uref() and brings sockmap in-line with how a similar case is handled in prog array maps. Fixes: `3d9e952697` ("bpf: sockmap, fix leaking maps with attached but not detached progs") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-04-24 00:49:45 +02:00
John Fastabend	4dfe1bb952	bpf: sockmap sample use clang flag, -target bpf Per Documentation/bpf/bpf_devel_QA.txt add the -target flag to the sockmap Makefile. Relevant text quoted here, Otherwise, you can use bpf target. Additionally, you _must_ use bpf target when: - Your program uses data structures with pointer or long / unsigned long types that interface with BPF helpers or context data structures. Access into these structures is verified by the BPF verifier and may result in verification failures if the native architecture is not aligned with the BPF architecture, e.g. 64-bit. An example of this is BPF_PROG_TYPE_SK_MSG require '-target bpf' Fixes: `69e8cc134b` ("bpf: sockmap sample program") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-04-23 23:42:21 +02:00
John Fastabend	514d6c1959	bpf: Document sockmap '-target bpf' requirement for PROG_TYPE_SK_MSG BPF_PROG_TYPE_SK_MSG programs use a 'void *' for both data and the data_end pointers. Additionally, the verifier ensures that every accesses into the values is a __u64 read. This correctly maps on to the BPF 64-bit architecture. However, to ensure that when building on 32bit architectures that clang uses correct types the '-target bpf' option _must_ be specified. To make this clear add a note to the Documentation. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-04-23 23:42:21 +02:00
Roman Gushchin	6899b32b5b	bpf: disable and restore preemption in __BPF_PROG_RUN_ARRAY Running bpf programs requires disabled preemption, however at least some* of the BPF_PROG_RUN_ARRAY users do not follow this rule. To fix this bug, and also to make it not happen in the future, let's add explicit preemption disabling/re-enabling to the __BPF_PROG_RUN_ARRAY code. * for example: [ 17.624472] RIP: 0010:__cgroup_bpf_run_filter_sk+0x1c4/0x1d0 ... [ 17.640890] inet6_create+0x3eb/0x520 [ 17.641405] __sock_create+0x242/0x340 [ 17.641939] __sys_socket+0x57/0xe0 [ 17.642370] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 17.642944] SyS_socket+0xa/0x10 [ 17.643357] do_syscall_64+0x79/0x220 [ 17.643879] entry_SYSCALL_64_after_hwframe+0x42/0xb7 Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-04-23 23:20:11 +02:00
David S. Miller	77621f024d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) Fix SIP conntrack with phones sending session descriptions for different media types but same port numbers, from Florian Westphal. 2) Fix incorrect rtnl_lock mutex logic from IPVS sync thread, from Julian Anastasov. 3) Skip compat array allocation in ebtables if there is no entries, also from Florian. 4) Do not lose left/right bits when shifting marks from xt_connmark, from Jack Ma. 5) Silence false positive memleak in conntrack extensions, from Cong Wang. 6) Fix CONFIG_NF_REJECT_IPV6=m link problems, from Arnd Bergmann. 7) Cannot kfree rule that is already in list in nf_tables, switch order so this error handling is not required, from Florian Westphal. 8) Release set name in error path, from Florian. 9) include kmemleak.h in nf_conntrack_extend.c, from Stepheh Rothwell. 10) NAT chain and extensions depend on NF_TABLES. 11) Out of bound access when renaming chains, from Taehee Yoo. 12) Incorrect casting in xt_connmark leads to wrong bitshifting. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-23 16:22:24 -04:00
David S. Miller	f7c3b12cec	Merge branch 'ipv6-couple-of-fixes-for-rcu-change-to-from' David Ahern says: ==================== net/ipv6: couple of fixes for rcu change to from So many details... I am thankful for all the robots running the permutations and tools. Two bug fixes from the rcu change to rt->from: 1. missing rcu lock in ip6_negative_advice 2. rcu dereferences in 2 sites ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-23 16:12:55 -04:00
David Ahern	8a14e46f14	net/ipv6: Fix missing rcu dereferences on from kbuild test robot reported 2 uses of rt->from not properly accessed using rcu_dereference: 1. add rcu_dereference_protected to rt6_remove_exception_rt and make sure it is always called with rcu lock held. 2. change rt6_do_redirect to take a reference on 'from' when accessed the first time so it can be used the sceond time outside of the lock Fixes: `a68886a691` ("net/ipv6: Make from in rt6_info rcu protected") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-23 16:12:55 -04:00
David Ahern	c3c14da028	net/ipv6: add rcu locking to ip6_negative_advice syzbot reported a suspicious rcu_dereference_check: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 lockdep_rcu_suspicious+0x14a/0x153 kernel/locking/lockdep.c:4592 rt6_check_expired+0x38b/0x3e0 net/ipv6/route.c:410 ip6_negative_advice+0x67/0xc0 net/ipv6/route.c:2204 dst_negative_advice include/net/sock.h:1786 [inline] sock_setsockopt+0x138f/0x1fe0 net/core/sock.c:1051 __sys_setsockopt+0x2df/0x390 net/socket.c:1899 SYSC_setsockopt net/socket.c:1914 [inline] SyS_setsockopt+0x34/0x50 net/socket.c:1911 Add rcu locking around call to rt6_check_expired in ip6_negative_advice. Fixes: `a68886a691` ("net/ipv6: Make from in rt6_info rcu protected") Reported-by: syzbot+2422c9e35796659d2273@syzkaller.appspotmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-23 16:12:54 -04:00

... 3 4 5 6 7 ...

752830 Commits