linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-18 05:17:39 +07:00

Author	SHA1	Message	Date
David S. Miller	12ff91c8ba	Merge branch 'improving-TCP-behavior-on-host-congestion' Yuchung Cheng says: ==================== improving TCP behavior on host congestion This patch set aims to improve how TCP handle local qdisc congestion by simplifying the previous implementation. Previously when an skb fails to (re)transmit due to local qdisc congestion or other resource issue, TCP refrains from setting the skb timestamp or the recovery starting time. This design makes determining when to abort a stalling socket more complicated, as the timestamps of these tranmission attempts were missing. The stack needs to sort of infer when the original attempt happens. A by-product is a socket may disregard the system timeout limit (i.e. sysctl net.ipv4.tcp_retries2 or USER_TIMEOUT option), and continue to retry until the transmission is successful. In data-center environment when TCP RTO is small, this could cause the socket to retry frequently for long during qdisc congestion. The solution is to first unconditionally timestamp skb and recovery attempt. Then retry more conservatively (twice a second) on local qdisc congestion but abort the sockets according to the system limit. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:12:26 -08:00
Yuchung Cheng	c1d5674f83	tcp: less aggressive window probing on local congestion Previously when the sender fails to send (original) data packet or window probes due to congestion in the local host (e.g. throttling in qdisc), it'll retry within an RTO or two up to 500ms. In low-RTT networks such as data-centers, RTO is often far below the default minimum 200ms. Then local host congestion could trigger a retry storm pouring gas to the fire. Worse yet, the probe counter (icsk_probes_out) is not properly updated so the aggressive retry may exceed the system limit (15 rounds) until the packet finally slips through. On such rare events, it's wise to retry more conservatively (500ms) and update the stats properly to reflect these incidents and follow the system limit. Note that this is consistent with the behaviors when a keep-alive probe or RTO retry is dropped due to local congestion. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:12:26 -08:00
Yuchung Cheng	590d2026d6	tcp: retry more conservatively on local congestion Previously when the sender fails to retransmit a data packet on timeout due to congestion in the local host (e.g. throttling in qdisc), it'll retry within an RTO up to 500ms. In low-RTT networks such as data-centers, RTO is often far below the default minimum 200ms (and the cap 500ms). Then local host congestion could trigger a retry storm pouring gas to the fire. Worse yet, the retry counter (icsk_retransmits) is not properly updated so the aggressive retry may exceed the system limit (15 rounds) until the packet finally slips through. On such rare events, it's wise to retry more conservatively (500ms) and update the stats properly to reflect these incidents and follow the system limit. Note that this is consistent with the behavior when a keep-alive probe is dropped due to local congestion. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:12:26 -08:00
Yuchung Cheng	9721e709fa	tcp: simplify window probe aborting on USER_TIMEOUT Previously we use the next unsent skb's timestamp to determine when to abort a socket stalling on window probes. This no longer works as skb timestamp reflects the last instead of the first transmission. Instead we can estimate how long the socket has been stalling with the probe count and the exponential backoff behavior. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:12:26 -08:00
Yuchung Cheng	01a523b071	tcp: create a helper to model exponential backoff Create a helper to model TCP exponential backoff for the next patch. This is pure refactor w no behavior change. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:12:26 -08:00
Yuchung Cheng	c7d13c8faa	tcp: properly track retry time on passive Fast Open This patch addresses a corner issue on timeout behavior of a passive Fast Open socket. A passive Fast Open server may write and close the socket when it is re-trying SYN-ACK to complete the handshake. After the handshake is completely, the server does not properly stamp the recovery start time (tp->retrans_stamp is 0), and the socket may abort immediately on the very first FIN timeout, instead of retying until it passes the system or user specified limit. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:12:26 -08:00
Yuchung Cheng	7ae189759c	tcp: always set retrans_stamp on recovery Previously TCP socket's retrans_stamp is not set if the retransmission has failed to send. As a result if a socket is experiencing local issues to retransmit packets, determining when to abort a socket is complicated w/o knowning the starting time of the recovery since retrans_stamp may remain zero. This complication causes sub-optimal behavior that TCP may use the latest, instead of the first, retransmission time to compute the elapsed time of a stalling connection due to local issues. Then TCP may disrecard TCP retries settings and keep retrying until it finally succeed: not a good idea when the local host is already strained. The simple fix is to always timestamp the start of a recovery. It's worth noting that retrans_stamp is also used to compare echo timestamp values to detect spurious recovery. This patch does not break that because retrans_stamp is still later than when the original packet was sent. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:12:26 -08:00
Yuchung Cheng	7f12422c48	tcp: always timestamp on every skb transmission Previously TCP skbs are not always timestamped if the transmission failed due to memory or other local issues. This makes deciding when to abort a socket tricky and complicated because the first unacknowledged skb's timestamp may be 0 on TCP timeout. The straight-forward fix is to always timestamp skb on every transmission attempt. Also every skb retransmission needs to be flagged properly to avoid RTT under-estimation. This can happen upon receiving an ACK for the original packet and the a previous (spurious) retransmission has failed. It's worth noting that this reverts to the old time-stamping style before commit `8c72c65b42` ("tcp: update skb->skb_mstamp more carefully") which addresses a problem in computing the elapsed time of a stalled window-probing socket. The problem will be addressed differently in the next patches with a simpler approach. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:12:26 -08:00
Yuchung Cheng	88f8598d0a	tcp: exit if nothing to retransmit on RTO timeout Previously TCP only warns if its RTO timer fires and the retransmission queue is empty, but it'll cause null pointer reference later on. It's better to avoid such catastrophic failure and simply exit with a warning. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:12:26 -08:00
Heiner Kallweit	9b420eff9f	net: phy: micrel: use phy_read_mmd and phy_write_mmd This driver implements open-coded versions of phy_read_mmd() and phy_write_mmd() for KSZ9031. That's not needed, let's use the phylib functions directly. This is compile-tested only because I have no such hardware. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:10:00 -08:00
Mathieu Malaterre	43deda5408	davicom: Annotate implicit fall through in dm9000_set_io There is a plan to build the kernel with -Wimplicit-fallthrough and this place in the code produced a warning (W=1). This commit removes the following warning: include/linux/device.h:1480:5: warning: this statement may fall through [-Wimplicit-fallthrough=] drivers/net/ethernet/davicom/dm9000.c:397:3: note: in expansion of macro 'dev_dbg' drivers/net/ethernet/davicom/dm9000.c:398:2: note: here Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:08:17 -08:00
David Herrmann	49b4994c14	net/ipv6/udp_tunnel: prefer SO_BINDTOIFINDEX over SO_BINDTODEVICE The udp-tunnel setup allows binding sockets to a network device. Prefer the new SO_BINDTOIFINDEX to avoid temporarily resolving the device-name just to look it up in the ioctl again. Reviewed-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 14:55:52 -08:00
David Herrmann	2eadee72db	net/ipv4/udp_tunnel: prefer SO_BINDTOIFINDEX over SO_BINDTODEVICE The udp-tunnel setup allows binding sockets to a network device. Prefer the new SO_BINDTOIFINDEX to avoid temporarily resolving the device-name just to look it up in the ioctl again. Reviewed-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 14:55:52 -08:00
David Herrmann	f5dd3d0c96	net: introduce SO_BINDTOIFINDEX sockopt This introduces a new generic SOL_SOCKET-level socket option called SO_BINDTOIFINDEX. It behaves similar to SO_BINDTODEVICE, but takes a network interface index as argument, rather than the network interface name. User-space often refers to network-interfaces via their index, but has to temporarily resolve it to a name for a call into SO_BINDTODEVICE. This might pose problems when the network-device is renamed asynchronously by other parts of the system. When this happens, the SO_BINDTODEVICE might either fail, or worse, it might bind to the wrong device. In most cases user-space only ever operates on devices which they either manage themselves, or otherwise have a guarantee that the device name will not change (e.g., devices that are UP cannot be renamed). However, particularly in libraries this guarantee is non-obvious and it would be nice if that race-condition would simply not exist. It would make it easier for those libraries to operate even in situations where the device-name might change under the hood. A real use-case that we recently hit is trying to start the network stack early in the initrd but make it survive into the real system. Existing distributions rename network-interfaces during the transition from initrd into the real system. This, obviously, cannot affect devices that are up and running (unless you also consider moving them between network-namespaces). However, the network manager now has to make sure its management engine for dormant devices will not run in parallel to these renames. Particularly, when you offload operations like DHCP into separate processes, these might setup their sockets early, and thus have to resolve the device-name possibly running into this race-condition. By avoiding a call to resolve the device-name, we no longer depend on the name and can run network setup of dormant devices in parallel to the transition off the initrd. The SO_BINDTOIFINDEX ioctl plugs this race. Reviewed-by: Tom Gundersen <teg@jklm.no> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 14:55:51 -08:00
Vakul Garg	692d7b5d1f	tls: Fix recvmsg() to be able to peek across multiple records This fixes recvmsg() to be able to peek across multiple tls records. Without this patch, the tls's selftests test case 'recv_peek_large_buf_mult_recs' fails. Each tls receive context now maintains a 'rx_list' to retain incoming skb carrying tls records. If a tls record needs to be retained e.g. for peek case or for the case when the buffer passed to recvmsg() has a length smaller than decrypted record length, then it is added to 'rx_list'. Additionally, records are added in 'rx_list' if the crypto operation runs in async mode. The records are dequeued from 'rx_list' after the decrypted data is consumed by copying into the buffer passed to recvmsg(). In case, the MSG_PEEK flag is used in recvmsg(), then records are not consumed or removed from the 'rx_list'. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 14:20:40 -08:00
David S. Miller	fb73d62025	Merge branch 'dsa-lantiq_gswip-probe-fixes-and-remove-cleanup' Johan Hovold says: ==================== net: dsa: lantiq_gswip: probe fixes and remove cleanup This series fix a few issues found through inspection when fixing up new bad uses of of_find_compatible_node() that have crept in since 4.19. Note that these have only been compile tested. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 12:12:19 -08:00
Johan Hovold	8bb18f69c7	net: dsa: lantiq_gswip: drop bogus drvdata check The platform-device driver data is set on successful probe and will never be NULL on remove (or we have much bigger problems). Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 12:12:19 -08:00
Johan Hovold	c8cbcb0d8b	net: dsa: lantiq_gswip: fix OF child-node lookups Use the new of_get_compatible_child() helper to look up child nodes to avoid ever matching non-child nodes elsewhere in the tree. Also fix up the related struct device_node leaks. Fixes: `14fceff477` ("net: dsa: Add Lantiq / Intel DSA driver for vrx200") Cc: stable <stable@vger.kernel.org> # 4.20 Cc: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 12:12:19 -08:00
Johan Hovold	aed13f2e00	net: dsa: lantiq_gswip: fix use-after-free on failed probe Make sure to disable and deregister the switch on late probe errors to avoid use-after-free when the device-resource-managed switch is freed. Fixes: `14fceff477` ("net: dsa: Add Lantiq / Intel DSA driver for vrx200") Cc: stable <stable@vger.kernel.org> # 4.20 Cc: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 12:12:19 -08:00
Bert Kenward	5fb1beecea	sfc: extend MTD support for newer hardware The X2 family of NICs (based on the SFC9250) have additional MTD partitions for firmware and configuration. This includes partitions that are read-only. The NICs also have extended versions of the NVRAM interface, allowing more detailed status information to be returned. Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 12:10:14 -08:00
Vakul Garg	cea3bfb374	selftests/tls: Fix recv partial/large_buff test cases TLS test cases recv_partial & recv_peek_large_buf_mult_recs expect to receive a certain amount of data and then compare it against known strings using memcmp. To prevent recvmsg() from returning lesser than expected number of bytes (compared in memcmp), MSG_WAITALL needs to be passed in recvmsg(). Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 11:57:45 -08:00
Heiner Kallweit	13d0ab6750	net: phy: check return code when requesting PHY driver module When requesting the PHY driver module fails we'll bind the genphy driver later. This isn't obvious to the user and may cause, depending on the PHY, different types of issues. Therefore check the return code of request_module(). Note that we only check for failures in loading the module, not whether a module exists for the respective PHY ID. v2: - add comment explaining what is checked and what is not - return error from phy_device_create() if loading module fails Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 11:56:46 -08:00
YueHaibing	01cb8a1a64	net/tls: Make function tls_sw_do_sendpage static Fixes the following sparse warning: net/tls/tls_sw.c:1023:5: warning: symbol 'tls_sw_do_sendpage' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 11:45:21 -08:00
YueHaibing	f3de19af0f	net/tls: remove unused function tls_sw_sendpage_locked There are no in-tree callers. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 11:44:58 -08:00
Vakul Garg	fda497e5f5	Optimize sk_msg_clone() by data merge to end dst sg entry Function sk_msg_clone has been modified to merge the data from source sg entry to destination sg entry if the cloned data resides in same page and is contiguous to the end entry of destination sk_msg. This improves kernel tls throughput to the tune of 10%. When the user space tls application calls sendmsg() with MSG_MORE, it leads to calling sk_msg_clone() with new data being cloned placed continuous to previously cloned data. Without this optimization, a new SG entry in the destination sk_msg i.e. rec->msg_plaintext in tls_clone_plaintext_msg() gets used. This leads to exhaustion of sg entries in rec->msg_plaintext even before a full 16K of allowable record data is accumulated. Hence we lose oppurtunity to encrypt and send a full 16K record. With this patch, the kernel tls can accumulate full 16K of record data irrespective of the size of data passed in sendmsg() with MSG_MORE. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 11:42:26 -08:00
Gustavo A. R. Silva	4559dd2482	net: hns: Use struct_size() in devm_kzalloc() One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = devm_kzalloc(dev, sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = devm_kzalloc(dev, struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 11:33:46 -08:00
Florian Fainelli	5db5ea995f	net: phy: Add helpers to determine if PHY driver is generic We are already checking in phy_detach() that the PHY driver is of generic kind (1G or 10G) and we are going to make use of that in the SFP layer as well for 1000BaseT SFP modules, so expose helper functions to return that information. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 11:33:17 -08:00
David S. Miller	6f24e15991	Merge branch 'dsa-Split-platform-data-to-header-file' Florian Fainelli says: ==================== net: dsa: Split platform data to header file This patch series decouples the DSA platform data structures from net/dsa.h which was getting used for all sorts of DSA related structures. It would probably make sense for this series to go via David's net-next tree to avoid conflicts on the ARM part, since we cannot obviously include a header that does not yet exist. No functional changes intended. ==================== Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 11:31:24 -08:00
Florian Fainelli	8cfb5faf32	net: dsa: Include platform_data header file b53 and mv88e6xxx support passing platform_data, and now that we have split the platform_data portion from the main net/dsa.h header file, include only the relevant parts. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 11:31:24 -08:00
Florian Fainelli	e5f02a3109	ARM: orion5x: Include platform_data/dsa.h Now that we have split the DSA platform data structures from the main net/dsa.h header file, include only the relevant header file. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 11:31:24 -08:00
Florian Fainelli	ecfc937210	net: dsa: Split platform data to header file Instead of having net/dsa.h contain both the internal switch tree/driver structures, split the relevant platform_data parts into include/linux/platform_data/dsa.h and make that header be included by net/dsa.h in order not to break any setup. A subsequent set of patches will update code including net/dsa.h to include only the platform_data header. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 11:31:24 -08:00
Xue Chaojing	905b464ad9	net-next/hinic: replace disable_irq_nosync/enable_irq In order to avoid frequent system interrupts when sending and receiving packets. we replace disable_irq_nosync/enable_irq with hinic_set_msix_state(), hinic_set_msix_state is used to access memory mapped hinic devices. Signed-off-by: Xue Chaojing <xuechaojing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 11:22:36 -08:00
Florian Fainelli	da7b9e9b00	net: dsa: Add ndo_get_phys_port_name() for CPU port There is not currently way to infer the port number through sysfs that is being used as the CPU port number. Overlay a ndo_get_phys_port_name() operation onto the DSA master network device in order to retrieve that information. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 21:12:21 -08:00
Florian Fainelli	44543f1dd2	Documentation: networking: dsa: Update documentation Since `83c0afaec7` ("net: dsa: Add new binding implementation"), DSA is no longer a platform device exclusively and can support registering DSA switches from other bus drivers (PCI, USB, I2C, etc.). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 21:11:35 -08:00
Gustavo A. R. Silva	78c787c21f	cxgb4/l2t: Use struct_size() in kvzalloc() One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kvzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 21:11:09 -08:00
Gustavo A. R. Silva	c5c3899de0	openvswitch: meter: Use struct_size() in kzalloc() One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 21:10:47 -08:00
Heiner Kallweit	3fcb3f9b68	net: phy: don't include asm/irq.h directly There's no need to and one shouldn't include asm/irq.h directly. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 21:08:50 -08:00
Heiner Kallweit	c3a6a174d5	net: phy: improve logging in phylib Some time ago phydev_info() and friends have been added. They allow to improve and simplify logging. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 21:03:44 -08:00
Heiner Kallweit	1868e3d722	net: phy: remove preliminary workaround for not loading PHY driver This workaround attempt helped for some but not all affected users. With commit `11287b693d` ("r8169: load Realtek PHY driver module before r8169") we have a better workaround now, so we an remove the first attempt. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 20:56:47 -08:00
David S. Miller	159882f42c	Merge branch 'nfp-flower-improve-flower-resilience' Jakub Kicinski says: ==================== nfp: flower: improve flower resilience This series contains mostly changes which improve nfp flower offload's resilience, but are too large or risky to push into net. Fred makes the driver waits for flower FW responses uninterruptible, and a little longer (~40ms). Pieter adds support for cards with multiple rule memories. John reworks the MAC offloads. He says: > When potential tunnel end-point MACs are offloaded, they are assigned an > index. This index may be associated with a port number meaning that if a > packet matches an offloaded MAC address on the card, then the ingress > port for that MAC can also be verified. In the case of shared MACs (e.g. > on a linux bond) there may be situations where this index maps to only > one of the ports that share the MAC. > > The idea of 'global' MAC indexes are supported that bypass the check on > ingress port on the NFP. The patchset tracks shared MACs and assigns > global indexes to these. It also ensures that port based indexes are > re-applied if a single port becomes the only user of an offloaded MAC. > > Other patches in the set aim to tidy code without changing functionality. > There is also a delete offload message introduced to ensure that MACs no > longer in use in kernel space are removed from the firmware lookup tables. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 15:23:15 -08:00
John Hurley	20cce88650	nfp: flower: enable MAC address sharing for offloadable devs A MAC address is not necessarily a unique identifier for a netdev. Drivers such as Linux bonds, for example, can apply the same MAC address to the upper layer device and all lower layer devices. NFP MAC offload for tunnel decap includes port verification for reprs but also supports the offload of non-repr MAC addresses by assigning 'global' indexes to these. This means that the FW will not verify the incoming port of a packet matching this destination MAC. Modify the MAC offload logic to assign global indexes based on MAC address instead of net device (as it currently does). Use this to allow multiple devices to share the same MAC. In other words, if a repr shares its MAC address with another device then give the offloaded MAC a global index rather than associate it with an ingress port. Track this so that changes can be reverted as MACs stop being shared. Implement this by removing the current list based assignment of global indexes and replacing it with an rhashtable that maps an offloaded MAC address to the number of devices sharing it, distributing global indexes based on this. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 15:23:15 -08:00
John Hurley	13cf71031d	nfp: flower: ensure MAC cleanup on address change It is possible to receive a MAC address change notification without the net device being down (e.g. when an OvS bridge is assigned the same MAC as a port added to it). This means that an offloaded MAC address may not be removed if its device gets a new address. Maintain a record of the offloaded MAC addresses for each repr and netdev assigned a MAC offload index. Use this to delete the (now expired) MAC if a change of address event occurs. Only handle change address events if the device is already up - if not then the netdev up event will handle it. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 15:23:15 -08:00
John Hurley	05d2bee6bd	nfp: flower: add infastructure for non-repr priv data NFP repr netdevs contain private data that can store per port information. In certain cases, the NFP driver offloads information from non-repr ports (e.g. tunnel ports). As the driver does not have control over non-repr netdevs, it cannot add/track private data directly to the netdev struct. Add infastructure to store private information on any non-repr netdev that is offloaded at a given time. This is used in a following patch to track offloaded MAC addresses for non-reprs and enable correct house keeping on address changes. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 15:23:15 -08:00
John Hurley	49402b0b7f	nfp: flower: ensure deletion of old offloaded MACs When a potential tunnel end point goes down then its MAC address should not be matchable on the NFP. Implement a delete message for offloaded MACs and call this on net device down. While at it, remove the actions on register and unregister netdev events. A MAC should only be offloaded if the device is up. Note that the netdev notifier will replay any notifications for UP devices on registration so NFP can still offload ports that exist before the driver is loaded. Similarly, devices need to go down before they can be unregistered so removal of offloaded MACs is only required on down events. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 15:23:15 -08:00
John Hurley	0115dcc314	nfp: flower: remove list infastructure from MAC offload Potential MAC destination addresses for tunnel end-points are offloaded to firmware. This was done by building a list of such MACs and writing to firmware as blocks of addresses. Simplify this code by removing the list format and sending a new message for each offloaded MAC. This is in preparation for delete MAC messages. There will be one delete flag per message so we cannot assume that this applies to all addresses in a list. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 15:23:15 -08:00
John Hurley	41da0b5ef3	nfp: flower: ignore offload of VF and PF repr MAC addresses Currently MAC addresses of all repr netdevs, along with selected non-NFP controlled netdevs, are offloaded to FW as potential tunnel end-points. However, the addresses of VF and PF reprs are meaningless outside of internal communication and it is only those of physical port reprs required. Modify the MAC address offload selection code to ignore VF/PF repr devs. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 15:23:15 -08:00
John Hurley	f3b975778c	nfp: flower: tidy tunnel related private data Recent additions to the flower app private data have grouped the variables of a given feature into a struct and added that struct to the main private data struct. In keeping with this, move all tunnel related private data to their own struct. This has no affect on functionality but improves readability and maintenance of the code. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 15:23:15 -08:00
Pieter Jansen van Vuuren	467322e262	nfp: flower: support multiple memory units for filter offloads Adds support for multiple memory units which are used for filter offloads. Each filter is assigned a stats id, the MSBs of the id are used to determine which memory unit the filter should be offloaded to. The number of available memory units that could be used for filter offload is obtained from HW. A simple round robin technique is used to allocate and distribute the ids across memory units. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 15:23:14 -08:00
Fred Lotter	96439889b4	nfp: flower: increase cmesg reply timeout QA tests report occasional timeouts on REIFY message replies. Profiling of the two cmesg reply types under burst conditions, with a 12-core host under heavy cpu and io load (stress --cpu 12 --io 12), show both PHY MTU change and REIFY replies can exceed the 10ms timeout. The maximum MTU reply wait under burst is 16ms, while the maximum REIFY wait under 40 VF burst is 12ms. Using a 4 VF REIFY burst results in an 8ms maximum wait. A larger VF burst does increase the delay, but not in a linear enough way to justify a scaled REIFY delay. The worse case values between MTU and REIFY appears close enough to justify a common timeout. Pick a conservative 40ms to make a safer future proof common reply timeout. The delay only effects the failure case. Change the REIFY timeout mechanism to use wait_event_timeout() instead of wait_event_interruptible_timeout(), to match the MTU code. In the current implementation, theoretically, a signal could interrupt the REIFY waiting period, with a return code of ERESTARTSYS. However, this is caught under the general timeout error code EIO. I cannot see the benefit of exposing the REIFY waiting period to signals with such a short delay (40ms), while the MTU mechnism does not use the same logic. In the absence of any reply (wakeup() call), both reply types will wake up the task after the timeout period. The REIFY timeout applies to the entire representor group being instantiated (e.g. VFs), while the MTU timeout apples to a single PHY MTU change. Signed-off-by: Fred Lotter <frederik.lotter@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 15:23:14 -08:00
Colin Ian King	bdbe8cc1a3	net: sungem: fix indentation, remove a tab The declaration of variable 'found' is one level too deep, fix this by removing a tab. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-16 14:04:59 -08:00

1 2 3 4 5 ...

810779 Commits