linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-19 05:18:41 +07:00

Author	SHA1	Message	Date
Vladimir Oltean	fa914e9c4d	net: mscc: ocelot: create a helper for changing the port MTU Since in an NPI/DSA setup, not all ports will have the same MTU, we need to make sure the watermarks for pause frames and/or tail dropping logic that existed in the driver is still coherent for the new MTU values. We need to do this because the NPI (aka external CPU) port needs an increased MTU for the DSA tag. This will be done in a future patch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:32:16 -08:00
Vladimir Oltean	5bc9d2e6e7	net: mscc: ocelot: move invariant configs out of adjust_link It doesn't make sense to rewrite all these registers every time the PHY library notifies us about a link state change. In a future patch we will customize the MTU for the CPU port, and since the MTU was previously configured from adjust_link, if we don't make this change, its value would have got overridden. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:32:16 -08:00
Claudiu Manoil	dc3de2a294	net: mscc: ocelot: filter out ocelot SoC specific PCS config from common path The adjust_link routine should be generic enough to be (re)used by any SoC that integrates a switch core compatible with the Ocelot core switch driver. Currently all configurations are generic except for the PCS settings that are SoC specific. Move these out to the Ocelot SoC/board instance. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:32:16 -08:00
Claudiu Manoil	259630e08c	net: mscc: ocelot: move resource ioremap and regmap init to common code Let's make this ioremap and regmap init code common. It should not be platform dependent as it should be usable by PCI devices too. Use better names where necessary to avoid clashes. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:32:16 -08:00
David S. Miller	e7be235fa7	Merge branch 'net-smc-improve-termination-handling-part-3' Karsten Graul says: ==================== net/smc: improve termination handling (part 3) Part 3 of the SMC termination patches improves the link group termination processing and introduces the ability to immediately terminate a link group. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:28:28 -08:00
Ursula Braun	0b29ec6436	net/smc: immediate termination for SMCR link groups If the SMC module is unloaded or an IB device is thrown away, the immediate link group freeing introduced for SMCD is exploited for SMCR as well. That means SMCR-specifics are added to smc_conn_kill(). Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:28:28 -08:00
Ursula Braun	6a37ad3da5	net/smc: wait for tx completions before link freeing Make sure all pending work requests are completed before freeing a link. Dismiss tx pending slots already when terminating a link group to exploit termination shortcut in tx completion queue handler. And kill the completion queue tasklets after destroy of the completion queues, otherwise there is a time window for another tasklet schedule of an already killed tasklet. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:28:28 -08:00
Ursula Braun	2c1d3e5030	net/smc: abnormal termination without orderly flag For abnormal termination issue an LLC DELETE_LINK without the orderly flag. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:28:28 -08:00
Ursula Braun	15e1b99aad	net/smc: no WR buffer wait for terminating link group Avoid waiting for a free work request buffer, if the link group is already terminating. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:28:28 -08:00
Ursula Braun	5edd6b9cb8	net/smc: introduce bookkeeping of SMCD link groups If the ism module is unloaded return control from exit routine only, if all link groups are freed. If an IB device is thrown away return control from device removal only, if all link groups belonging to this device are freed. A counters for the total number of SMCD link groups per ISM device is introduced. ism module unloading continues only if the total number of SMCD link groups for all ISM devices is zero. ISM device removal continues only it the total number of SMCD link groups per ISM device has decreased to zero. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:28:28 -08:00
Ursula Braun	5421ec281d	net/smc: abnormal termination of SMCD link groups A final cleanup due to SMCD device removal means immediate freeing of all link groups belonging to this device in interrupt context. This patch introduces a separate SMCD link group termination routine, which terminates all link groups of an SMCD device. This new routine smcd_terminate_all ()is reused if the smc module is unloaded. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:28:28 -08:00
Ursula Braun	42bfba9eaa	net/smc: immediate termination for SMCD link groups SMCD link group termination is called when peer signals its shutdown of its corresponding link group. For regular shutdowns no connections exist anymore. For abnormal shutdowns connections must be killed and their DMBs must be unregistered immediately. That means the SMCR method to delay the link group freeing several seconds does not fit. This patch adds immediate termination of a link group and its SMCD connections and makes sure all SMCD link group related cleanup steps are finished. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:28:28 -08:00
Ursula Braun	50c6b20eff	net/smc: fix final cleanup sequence for SMCD devices If peer announces shutdown, use the link group terminate worker for local cleanup of link groups and connections to terminate link group in proper context. Make sure link groups are cleaned up first before destroying the event queue of the SMCD device, because link group cleanup may raise events. Send signal shutdown only if peer has not done it already. Send socket abort or close only, if peer has not already announced shutdown. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:28:28 -08:00
David S. Miller	43da44c876	Merge branch 'net-stmmac-CPU-Performance-Improvements' Jose Abreu says: ==================== net: stmmac: CPU Performance Improvements CPU Performance improvements for stmmac. Please check bellow for results before and after the series. Patch 1/7, allows RX Interrupt on Completion to be disabled and only use the RX HW Watchdog. Patch 2/7, setups the default RX coalesce settings instead of using the minimum value. Patch 3/7 and 4/7, removes the uneeded computations for RX Flow Control activation/de-activation, on some cases. Patch 5/7, tunes-up the default coalesce settings. Patch 6/7, re-works the TX coalesce timer activation logic. Patch 7/7, removes the now uneeded TBU interrupt. NetPerf UDP Results: -------------------- Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SS us/KB --- XGMAC@2.5G: Before 212992 1400 10.00 2100620 0 2351.7 36.69 5.112 212992 10.00 2100539 2351.6 26.18 3.648 --- XGMAC@2.5G: After 212992 1400 10.00 2108972 0 2361.5 21.73 3.015 212992 10.00 2097038 2348.1 19.21 2.666 --- GMAC5@1G: Before 212992 1400 10.00 786000 0 880.2 34.71 12.923 212992 10.00 786000 880.2 23.42 8.719 --- GMAC5@1G: After 212992 1400 10.00 842648 0 943.7 14.12 4.903 212992 10.00 842648 943.7 12.73 4.418 Perf TCP Results on RX Path: ---------------------------- --- XGMAC@2.5G: Before 22.51% swapper [stmmac] [k] dwxgmac2_dma_interrupt 10.82% swapper [stmmac] [k] dwxgmac2_host_mtl_irq_status 5.21% swapper [stmmac] [k] dwxgmac2_host_irq_status 4.67% swapper [stmmac] [k] dwxgmac3_safety_feat_irq_status 3.63% swapper [kernel.kallsyms] [k] stack_trace_consume_entry 2.74% iperf3 [kernel.kallsyms] [k] copy_user_enhanced_fast_string 2.52% swapper [kernel.kallsyms] [k] update_stack_state 1.94% ksoftirqd/0 [stmmac] [k] dwxgmac2_dma_interrupt 1.45% iperf3 [kernel.kallsyms] [k] queued_spin_lock_slowpath 1.26% swapper [kernel.kallsyms] [k] create_object --- XGMAC@2.5G: After 7.43% swapper [kernel.kallsyms] [k] stack_trace_consume_entry 5.86% swapper [stmmac] [k] dwxgmac2_dma_interrupt 5.68% swapper [kernel.kallsyms] [k] update_stack_state 4.71% iperf3 [kernel.kallsyms] [k] copy_user_enhanced_fast_string 2.88% swapper [kernel.kallsyms] [k] create_object 2.69% swapper [stmmac] [k] dwxgmac2_host_mtl_irq_status 2.61% swapper [stmmac] [k] stmmac_napi_poll_rx 2.52% swapper [kernel.kallsyms] [k] unwind_next_frame.part.4 1.48% swapper [kernel.kallsyms] [k] unwind_get_return_address 1.38% swapper [kernel.kallsyms] [k] arch_stack_walk --- GMAC5@1G: Before 31.29% swapper [stmmac] [k] dwmac4_dma_interrupt 14.57% swapper [stmmac] [k] dwmac4_irq_mtl_status 10.66% swapper [stmmac] [k] dwmac4_irq_status 1.97% swapper [kernel.kallsyms] [k] stack_trace_consume_entry 1.73% iperf3 [kernel.kallsyms] [k] copy_user_enhanced_fast_string 1.59% swapper [kernel.kallsyms] [k] update_stack_state 1.15% iperf3 [kernel.kallsyms] [k] do_syscall_64 1.01% ksoftirqd/0 [stmmac] [k] dwmac4_dma_interrupt 0.89% swapper [kernel.kallsyms] [k] __default_send_IPI_dest_field 0.75% swapper [stmmac] [k] stmmac_napi_poll_rx --- GMAC5@1G: After 6.70% swapper [kernel.kallsyms] [k] stack_trace_consume_entry 5.79% swapper [stmmac] [k] dwmac4_dma_interrupt 5.29% swapper [kernel.kallsyms] [k] update_stack_state 3.52% iperf3 [kernel.kallsyms] [k] copy_user_enhanced_fast_string 2.83% swapper [stmmac] [k] dwmac4_irq_mtl_status 2.62% swapper [kernel.kallsyms] [k] create_object 2.46% swapper [stmmac] [k] stmmac_napi_poll_rx 2.32% swapper [kernel.kallsyms] [k] unwind_next_frame.part.4 2.19% swapper [stmmac] [k] dwmac4_irq_status 1.39% swapper [kernel.kallsyms] [k] unwind_get_return_address ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:25:42 -08:00
Jose Abreu	8d07a79304	net: stmmac: xgmac: Do not enable TBU interrupt Now that TX Coalesce has been rewritten we no longer need this additional interrupt enabled. This reduces CPU usage. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:25:41 -08:00
Jose Abreu	c2837423cb	net: stmmac: Rework TX Coalesce logic Coalesce logic currently increments the number of packets and sets the IC bit when the coalesced packets have passed a given limit. This does not reflect very well what coalesce was meant for as we can have a large number of packets that are coalesced and then a single one, sent later on that has the IC bit. Rework the logic so that it coalesces only upon a limit of packets and sets the IC bit for large number of packets. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:25:41 -08:00
Jose Abreu	da20245100	net: stmmac: Tune-up default coalesce settings Tune-up the defalt coalesce settings for optimal values. This gives the best performance in most of the use-cases. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:25:41 -08:00
Jose Abreu	52f96cd135	net: stmmac: xgmac: Remove uneeded computation for RFA/RFD RFA and RFD should not be dependent on FIFO size. In fact, the more FIFO space we have, the later we can activate Flow Control. Let's use hard-coded values for RFA and RFD for all FIFO sizes with the exception of 4k, which is a special case. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:25:41 -08:00
Jose Abreu	854248e5ec	net: stmmac: gmac4+: Remove uneeded computation for RFA/RFD RFA and RFD should not be dependent on FIFO size. In fact, the more FIFO space we have, the later we can activate Flow Control. Let's use hard-coded values for RFA and RFD for all FIFO sizes with the exception of 4k, which is a special case. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:25:41 -08:00
Jose Abreu	4e4337ccf7	net: stmmac: Setup a default RX Coalesce value instead of the minimum For performance reasons, sometimes using the minimum RX Coalesce value is not optimal. Lets setup a default value that is optimal in most of the use cases. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:25:41 -08:00
Jose Abreu	09146abebc	net: stmmac: Do not set RX IC bit if RX Coalesce is zero We may only want to use the RX Watchdog so lets check if RX Coalesce settings are non-zero and only set the RX Interrupt on Completion bit if its not. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:25:41 -08:00
Ido Schimmel	983db6198f	mlxsw: spectrum_router: Allocate discard adjacency entry when needed Commit `0c3cbbf96d` ("mlxsw: Add specific trap for packets routed via invalid nexthops") allocated an adjacency entry during driver initialization whose purpose is to discard packets hitting the route pointing to it. These adjacency entries are allocated from a resource called KVD linear (KVDL). There are situations in which the user can decide to set the size of this resource (via devlink-resource) to 0, in which case the driver will not be able to load. Therefore, instead of pre-allocating this adjacency entry, simply allocate it only when needed. A variable indicating the validity of the entry is added and is used to ensure it is only allocated and written once and that it is freed after all the routes were flushed. Fixes: `0c3cbbf96d` ("mlxsw: Add specific trap for packets routed via invalid nexthops") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:24:54 -08:00
YueHaibing	d6649d788e	net/tls: Fix unused function warning If PROC_FS is not set, gcc warning this: net/tls/tls_proc.c:23:12: warning: 'tls_statistics_seq_show' defined but not used [-Wunused-function] Use #ifdef to guard this. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:12:28 -08:00
David S. Miller	a98cdaf73e	Merge branch 's390-next' Julian Wiedmann says: ==================== s390/qeth: updates 2019-11-14 please apply the following qeth patches to net-next. Along with the usual cleanups, this (1) reduces collateral packet loss in the RX path when dealing with bad packets and/or allocation errors, and (2) simplifies how the L3 driver deals with mcast IP addresses. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	0b81c6c620	s390/qeth: don't check drvdata in sysfs code Given the way how the sysfs attributes are registered / unregistered, the show/store helpers will never be called with a NULL drvdata. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	b80c08ac94	s390/qeth: replace qeth_l3_get_addr_buffer() The remaining usage effectively is a kmemdup() of the query object. By not wrapping it, some of the callers can now use GFP_KERNEL for the allocation. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	8659c189b6	s390/qeth: remove VLAN tracking for L3 devices Use vlan_for_each() instead of tracking each registered VID internally. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	611abe5165	s390/qeth: consolidate L3 mcast registration code Current code processes each (VLAN) device twice - once to inspect the IPv4 mcast addresses, and then a second time to walk the IPv6 mcast addresses. Unify all this into a single helper, thus removing some checks and a duplicated VLAN lookup. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	32a186c7f9	s390/qeth: remove gratuitious RX modeset Trust the IPv4/IPv6 code to properly remove its mcast addresses when a VLAN device is unregistered, and then also trigger an RX modeset whenever it's needed. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	ddf28100ee	s390/qeth: fine-tune L3 mcast locking Push the inet6_dev locking down into the helper that actually needs it for walking the mc_list. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	8311c7a252	s390/qeth: clean up error path in qeth_core_probe_device() qeth_core_free_card() is meant to be the counterpart of qeth_alloc_card() - but unfortunately was also picked as the place to free the QDIO queues. This gets messy when qeth_core_probe_device() fails during qeth_add_dbf_entry(). At this point the card->qdio.state is not initialized yet, so qeth_free_qdio_queues() ends up operating on uninitialized data. Luckily for now, the whole qeth_card struct is zero-allocated and the value of the QETH_QDIO_UNINITIALIZED enum is 0 as well. So there's no real impact from this bug at the moment, it's just really fragile. Clean this up by moving the qeth_free_qdio_queues() call up one level in the hierarchy. This way it doesn't get called from the error path. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	17caeaa476	s390/qeth: handle skb allocation error gracefully When current code fails to allocate an skb in the RX path, it drops the whole RX buffer. Considering the large number of packets that a single RX buffer might contain, this is quite drastic. Skip over the packet instead, and try to extract the next packet from the RX buffer. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	7d4faee7c6	s390/qeth: drop unwanted packets earlier in RX path Packets with an unexpected HW format are currently first extracted from the RX buffer, passed upwards to the layer-specific driver and only then finally dropped. Enhance the RX path so that we can drop such packets before even allocating an skb. For this, add some additional logic so that when a packet is meant to be dropped, we can still walk along the packet's data chunks in the RX buffer. This allows us to extract the following packet(s) from the buffer. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	5fd3fcbb8a	s390/qeth: support per-frame invalidation Each RX buffer may contain up to 64KB worth of data. In case the device needs to discard a packet _after_ already having reserved space for it in the buffer, the whole buffer gets set to ERROR state. As the buffer might contain any number of good packets, this can result in collateral packet loss. qeth can provide relief by enabling per-frame invalidation. The RX buffer is then presented as usual, we just need to spot & drop any individual packet that was flagged as invalid. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:50 -08:00
Julian Wiedmann	845ef9047b	s390/qeth: gather more detailed RX dropped/error statistics Where available, use the fine-grained counters in rtnl_link_stats64 to indicate different RX error causes. For drop reasons, use driver-private ethtool counters. In particular this patch allows us to keep track of driver-side drops due to unknown/unsupported HW descriptor format. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:50 -08:00
David S. Miller	24df31f8d5	Merge branch 'vsock-add-multi-transports-support' Stefano Garzarella says: ==================== vsock: add multi-transports support Most of the patches are reviewed by Dexuan, Stefan, and Jorgen. The following patches need reviews: - [11/15] vsock: add multi-transports support - [12/15] vsock/vmci: register vmci_transport only when VMCI guest/host are active - [15/15] vhost/vsock: refuse CID assigned to the guest->host transport RFC: https://patchwork.ozlabs.org/cover/1168442/ v1: https://patchwork.ozlabs.org/cover/1181986/ v1 -> v2: - Patch 11: + vmci_transport: sent reset when vsock_assign_transport() fails [Jorgen] + fixed loopback in the guests, checking if the remote_addr is the same of transport_g2h->get_local_cid() + virtio_transport_common: updated space available while creating the new child socket during a connection request - Patch 12: + removed 'features' variable in vmci_transport_init() [Stefan] + added a flag to register only once the host [Jorgen] - Added patch 15 to refuse CID assigned to the guest->host transport in the vhost_transport This series adds the multi-transports support to vsock, following this proposal: https://www.spinics.net/lists/netdev/msg575792.html With the multi-transports support, we can use VSOCK with nested VMs (using also different hypervisors) loading both guest->host and host->guest transports at the same time. Before this series, vmci_transport supported this behavior but only using VMware hypervisor on L0, L1, etc. The first 9 patches are cleanups and preparations, maybe some of these can go regardless of this series. Patch 10 changes the hvs_remote_addr_init(). setting the VMADDR_CID_HOST as remote CID instead of VMADDR_CID_ANY to make the choice of transport to be used work properly. Patch 11 adds multi-transports support. Patch 12 changes a little bit the vmci_transport and the vmci driver to register the vmci_transport only when there are active host/guest. Patch 13 prevents the transport modules unloading while sockets are assigned to them. Patch 14 fixes an issue in the bind() logic discoverable only with the new multi-transport support. Patch 15 refuses CID assigned to the guest->host transport in the vhost_transport. I've tested this series with nested KVM (vsock-transport [L0,L1], virtio-transport[L1,L2]) and with VMware (L0) + KVM (L1) (vmci-transport [L0,L1], vhost-transport [L1], virtio-transport[L2]). Dexuan successfully tested the RFC series on HyperV with a Linux guest. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:19 -08:00
Stefano Garzarella	ed8640a961	vhost/vsock: refuse CID assigned to the guest->host transport In a nested VM environment, we have to refuse to assign to a nested guest the same CID assigned to our guest->host transport. In this way, the user can use the local CID for loopback. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:18 -08:00
Stefano Garzarella	36c5b48b91	vsock: fix bind() behaviour taking care of CID When we are looking for a socket bound to a specific address, we also have to take into account the CID. This patch is useful with multi-transports support because it allows the binding of the same port with different CID, and it prevents a connection to a wrong socket bound to the same port, but with different CID. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:18 -08:00
Stefano Garzarella	6a2c096210	vsock: prevent transport modules unloading This patch adds 'module' member in the 'struct vsock_transport' in order to get/put the transport module. This prevents the module unloading while sockets are assigned to it. We increase the module refcnt when a socket is assigned to a transport, and we decrease the module refcnt when the socket is destructed. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:18 -08:00
Stefano Garzarella	b1bba80a43	vsock/vmci: register vmci_transport only when VMCI guest/host are active To allow other transports to be loaded with vmci_transport, we register the vmci_transport as G2H or H2G only when a VMCI guest or host is active. To do that, this patch adds a callback registered in the vmci driver that will be called when the host or guest becomes active. This callback will register the vmci_transport in the VSOCK core. Cc: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:18 -08:00
Stefano Garzarella	c0cfa2d8a7	vsock: add multi-transports support This patch adds the support of multiple transports in the VSOCK core. With the multi-transports support, we can use vsock with nested VMs (using also different hypervisors) loading both guest->host and host->guest transports at the same time. Major changes: - vsock core module can be loaded regardless of the transports - vsock_core_init() and vsock_core_exit() are renamed to vsock_core_register() and vsock_core_unregister() - vsock_core_register() has a feature parameter (H2G, G2H, DGRAM) to identify which directions the transport can handle and if it's support DGRAM (only vmci) - each stream socket is assigned to a transport when the remote CID is set (during the connect() or when we receive a connection request on a listener socket). The remote CID is used to decide which transport to use: - remote CID <= VMADDR_CID_HOST will use guest->host transport; - remote CID == local_cid (guest->host transport) will use guest->host transport for loopback (host->guest transports don't support loopback); - remote CID > VMADDR_CID_HOST will use host->guest transport; - listener sockets are not bound to any transports since no transport operations are done on it. In this way we can create a listener socket, also if the transports are not loaded or with VMADDR_CID_ANY to listen on all transports. - DGRAM sockets are handled as before, since only the vmci_transport provides this feature. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:18 -08:00
Stefano Garzarella	039642574c	hv_sock: set VMADDR_CID_HOST in the hvs_remote_addr_init() Remote peer is always the host, so we set VMADDR_CID_HOST as remote CID instead of VMADDR_CID_ANY. Reviewed-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:18 -08:00
Stefano Garzarella	55f3e149b6	vsock: move vsock_insert_unbound() in the vsock_create() vsock_insert_unbound() was called only when 'sock' parameter of __vsock_create() was not null. This only happened when __vsock_create() was called by vsock_create(). In order to simplify the multi-transports support, this patch moves vsock_insert_unbound() at the end of vsock_create(). Reviewed-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:18 -08:00
Stefano Garzarella	b9ca2f5ff7	vsock: add vsock_create_connected() called by transports All transports call __vsock_create() with the same parameters, most of them depending on the parent socket. In order to simplify the VSOCK core APIs exposed to the transports, this patch adds the vsock_create_connected() callable from transports to create a new socket when a connection request is received. We also unexported the __vsock_create(). Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:18 -08:00
Stefano Garzarella	b9f2b0ffde	vsock: handle buffer_size sockopts in the core virtio_transport and vmci_transport handle the buffer_size sockopts in a very similar way. In order to support multiple transports, this patch moves this handling in the core to allow the user to change the options also if the socket is not yet assigned to any transport. This patch also adds the '.notify_buffer_size' callback in the 'struct virtio_transport' in order to inform the transport, when the buffer_size is changed by the user. It is also useful to limit the 'buffer_size' requested (e.g. virtio transports). Acked-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:18 -08:00
Stefano Garzarella	daabfbca34	vsock: add 'struct vsock_sock ' param to vsock_core_get_transport() Since now the 'struct vsock_sock' object contains a pointer to the transport, this patch adds a parameter to the vsock_core_get_transport() to return the right transport assigned to the socket. This patch modifies also the virtio_transport_get_ops(), that uses the vsock_core_get_transport(), adding the 'struct vsock_sock ' parameter. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:18 -08:00
Stefano Garzarella	4c7246dc45	vsock/virtio: add transport parameter to the virtio_transport_reset_no_sock() We are going to add 'struct vsock_sock ' parameter to virtio_transport_get_ops(). In some cases, like in the virtio_transport_reset_no_sock(), we don't have any socket assigned to the packet received, so we can't use the virtio_transport_get_ops(). In order to allow virtio_transport_reset_no_sock() to use the '.send_pkt' callback from the 'vhost_transport' or 'virtio_transport', we add the 'struct virtio_transport ' to it and to its caller: virtio_transport_recv_pkt(). We moved the 'vhost_transport' and 'virtio_transport' definition, to pass their address to the virtio_transport_recv_pkt(). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:18 -08:00
Stefano Garzarella	fe502c4a38	vsock: add 'transport' member in the struct vsock_sock As a preparation to support multiple transports, this patch adds the 'transport' member at the 'struct vsock_sock'. This new field is initialized during the creation in the __vsock_create() function. This patch also renames the global 'transport' pointer to 'transport_single', since for now we're only supporting a single transport registered at run-time. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:18 -08:00
Stefano Garzarella	3603a2e991	vsock: remove include/linux/vm_sockets.h file This header file now only includes the "uapi/linux/vm_sockets.h". We can include directly it when needed. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:17 -08:00
Stefano Garzarella	db205c7668	vsock: remove vm_sockets_get_local_cid() vm_sockets_get_local_cid() is only used in virtio_transport_common.c. We can replace it calling the virtio_transport_get_ops() and using the get_local_cid() callback registered by the transport. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:17 -08:00

1 2 3 4 5 ...

874481 Commits