linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-24 02:49:05 +07:00

Author	SHA1	Message	Date
Jon Paul Maloy	a853e4c6d0	tipc: introduce replicast as transport option for multicast TIPC multicast messages are currently carried over a reliable 'broadcast link', making use of the underlying media's ability to transport packets as L2 broadcast or IP multicast to all nodes in the cluster. When the used bearer is lacking that ability, we can instead emulate the broadcast service by replicating and sending the packets over as many unicast links as needed to reach all identified destinations. We now introduce a new TIPC link-level 'replicast' service that does this. Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:10:17 -05:00
Jon Paul Maloy	2ae0b8af1f	tipc: add functionality to lookup multicast destination nodes As a further preparation for the upcoming 'replicast' functionality, we add some necessary structs and functions for looking up and returning a list of all nodes that host destinations for a given multicast message. Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:10:16 -05:00
Jon Paul Maloy	9999974a83	tipc: add function for checking broadcast support in bearer As a preparation for the 'replicast' functionality we are going to introduce in the next commits, we need the broadcast base structure to store whether bearer broadcast is available at all from the currently used bearer or bearers. We do this by adding a new function tipc_bearer_bcast_support() to the bearer layer, and letting the bearer selection function in bcast.c use this to give a new boolean field, 'bcast_support' the appropriate value. Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:10:15 -05:00
Gianluca Borello	a5e8c07059	bpf: add bpf_probe_read_str helper Provide a simple helper with the same semantics of strncpy_from_unsafe(): int bpf_probe_read_str(void dst, int size, const void unsafe_addr) This gives more flexibility to a bpf program. A typical use case is intercepting a file name during sys_open(). The current approach is: SEC("kprobe/sys_open") void bpf_sys_open(struct pt_regs ctx) { char buf[PATHLEN]; // PATHLEN is defined to 256 bpf_probe_read(buf, sizeof(buf), ctx->di); / consume buf / } This is suboptimal because the size of the string needs to be estimated at compile time, causing more memory to be copied than often necessary, and can become more problematic if further processing on buf is done, for example by pushing it to userspace via bpf_perf_event_output(), since the real length of the string is unknown and the entire buffer must be copied (and defining an unrolled strnlen() inside the bpf program is a very inefficient and unfeasible approach). With the new helper, the code can easily operate on the actual string length rather than the buffer size: SEC("kprobe/sys_open") void bpf_sys_open(struct pt_regs ctx) { char buf[PATHLEN]; // PATHLEN is defined to 256 int res = bpf_probe_read_str(buf, sizeof(buf), ctx->di); /* consume buf, for example push it to userspace via * bpf_perf_event_output(), but this time we can use * res (the string length) as event size, after checking * its boundaries. */ } Another useful use case is when parsing individual process arguments or individual environment variables navigating current->mm->arg_start and current->mm->env_start: using this helper and the return value, one can quickly iterate at the right offset of the memory area. The code changes simply leverage the already existent strncpy_from_unsafe() kernel function, which is safe to be called from a bpf program as it is used in bpf_trace_printk(). Signed-off-by: Gianluca Borello <g.borello@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:08:43 -05:00
David S. Miller	0760462860	Merge branch 'bus-agnostic-num-vf' Phil Sutter says: ==================== Retrieve number of VFs in a bus-agnostic way Previously, it was assumed that only PCI NICs would be capable of having virtual functions - with my proposed enhancement of dummy NIC driver implementing (fake) ones for testing purposes, this is no longer true. Discussion of said patch has led to the suggestion of implementing a bus-agnostic method for VF count retrieval so rtnetlink could work with both real VF-capable PCI NICs as well as my dummy modifications without introducing ugly hacks. The following series tries to achieve just that by introducing a bus type callback to retrieve a device's number of VFs, implementing this callback for PCI bus and finally adjusting rtnetlink to make use of the generalized infrastructure. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 11:43:17 -05:00
Phil Sutter	9af15c3825	device: Implement a bus agnostic dev_num_vf routine Now that pci_bus_type has num_vf callback set, dev_num_vf can be implemented in a bus type independent way and the check for whether a PCI device is being handled in rtnetlink can be dropped. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 11:43:17 -05:00
Phil Sutter	02e0bea6c8	PCI: implement num_vf bus type callback Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 11:43:16 -05:00
Phil Sutter	582a686f52	device: bus_type: Introduce num_vf callback This allows for bus types to implement their own method of retrieving the number of virtual functions a NIC on that type of bus supports. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 11:43:16 -05:00
Geliang Tang	6c59ebd356	sock: use hlist_entry_safe Use hlist_entry_safe() instead of open-coding it. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 11:38:45 -05:00
Jakub Sitnicki	c10aa71b9d	gre6: Clean up unused struct ipv6_tel_txoption definition Commit `b05229f442` ("gre6: Cleanup GREv6 transmit path, call common GRE functions") removed the ip6gre specific transmit function, but left the struct ipv6_tel_txoption definition. Clean it up. Signed-off-by: Jakub Sitnicki <jkbs@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 11:37:01 -05:00
Eric Dumazet	c2a2efbbfc	net: remove bh disabling around percpu_counter accesses Shaohua Li made percpu_counter irq safe in commit `098faf5805` ("percpu_counter: make APIs irq safe") We can safely remove BH disable/enable sections around various percpu_counter manipulations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 11:27:22 -05:00
Arnd Bergmann	0a327889f6	cxgb4: hide unused warnings The two new variables are only used inside of an #ifdef and cause harmless warnings when that is disabled: drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c: In function 'init_one': drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4646:9: error: unused variable 'port_vec' [-Werror=unused-variable] drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4646:6: error: unused variable 'v' [-Werror=unused-variable] This adds another #ifdef around the declarations. Fixes: `96fe11f27b` ("cxgb4: Implement ndo_get_phys_port_id for mgmt dev") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 11:16:57 -05:00
David Ahern	a1a22c1206	net: ipv6: Keep nexthop of multipath route on admin down IPv6 deletes route entries associated with multipath routes on an admin down where IPv4 does not. For example: $ ip ro ls vrf red unreachable default metric 8192 1.1.1.0/24 metric 64 nexthop via 10.100.1.254 dev eth1 weight 1 nexthop via 10.100.2.254 dev eth2 weight 1 10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.4 10.100.2.0/24 dev eth2 proto kernel scope link src 10.100.2.4 $ ip -6 ro ls vrf red 2001:db8:1::/120 dev eth1 proto kernel metric 256 pref medium 2001:db8:2:: dev red proto none metric 0 pref medium 2001:db8:2::/120 dev eth2 proto kernel metric 256 pref medium 2001:db8:11::/120 via 2001:db8:1::16 dev eth1 metric 1024 pref medium 2001:db8:11::/120 via 2001:db8:2::17 dev eth2 metric 1024 pref medium ... Set link down: $ ip li set eth1 down IPv4 retains the multihop route but flags eth1 route as dead: $ ip ro ls vrf red unreachable default metric 8192 1.1.1.0/24 nexthop via 10.100.1.16 dev eth1 weight 1 dead linkdown nexthop via 10.100.2.16 dev eth2 weight 1 10.100.2.0/24 dev eth2 proto kernel scope link src 10.100.2.4 and IPv6 deletes the route as part of flushing all routes for the device: $ ip -6 ro ls vrf red 2001:db8:2:: dev red proto none metric 0 pref medium 2001:db8:2::/120 dev eth2 proto kernel metric 256 pref medium 2001:db8:11::/120 via 2001:db8:2::17 dev eth2 metric 1024 pref medium ... Worse, on admin up of the device the multipath route has to be deleted to get this leg of the route re-added. This patch keeps routes that are part of a multipath route if ignore_routes_with_linkdown is set with the dead and linkdown flags enabling consistency between IPv4 and IPv6: $ ip -6 ro ls vrf red 2001:db8:2:: dev red proto none metric 0 pref medium 2001:db8:2::/120 dev eth2 proto kernel metric 256 pref medium 2001:db8:11::/120 via 2001:db8:1::16 dev eth1 metric 1024 dead linkdown pref medium 2001:db8:11::/120 via 2001:db8:2::17 dev eth2 metric 1024 pref medium ... Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-19 23:38:51 -05:00
Eric Dumazet	dceeab0e52	mlx4: support __GFP_MEMALLOC for rx Commit `04aeb56a17` ("net/mlx4_en: allocate non 0-order pages for RX ring with __GFP_NOMEMALLOC") added code that appears to be not needed at that time, since mlx4 never used __GFP_MEMALLOC allocations anyway. As using memory reserves is a must in some situations (swap over NFS or iSCSI), this patch adds this flag. Note that this driver does not reuse pages (yet) so we do not have to add anything else. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-19 23:35:12 -05:00
Timur Tabi	8a43c052c7	Revert "net: qcom/emac: configure the external phy to allow pause frames" This reverts commit `3e88449344`. With commit `529ed12752` ("net: phy: phy drivers should not set SUPPORTED_[Asym_]Pause"), phylib now handles automatically enabling pause frame support in the PHY, and the MAC driver should follow suit. Since the EMAC driver driver does this, we no longer need to force pause frames support. Signed-off-by: Timur Tabi <timur@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-19 23:14:52 -05:00
Saeed Mahameed	3dd69e3dd2	net/mlx5e: Reorder update stats Reorder update stats flow to update most important counters last, to get more accurate results. New update order: - PCIe counters - Port counters - Vport counters - Queue counters - Software counters Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>	2017-01-19 23:20:04 +02:00
Gal Pressman	701052c578	net/mlx5: Move cached hca caps to designated caps struct The caps structure consists of hca caps and port/management caps, all under one roof. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-19 23:20:03 +02:00
Gal Pressman	0f7f348192	net/mlx5e: Expose PCIe statistics to ethtool This patch exposes PCIe performance counters, queried with ethtool -S <devname>. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-19 23:20:02 +02:00
Gal Pressman	8ed1a6306d	net/mlx5: Add MPCNT register infrastructure Add the needed infrastructure for future use of MPCNT register. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-19 23:20:01 +02:00
Gal Pressman	5db0a4f64c	net/mlx5e: Expose physical layer statistical counters to ethtool Use ethtool -S to query physical layer statistical counters including: - rx_symbol_errors_phy: Number of symbol errors that were not corrected by FEC correction algorithm or that FEC was not active on this interface. - rx_corrected_bits_phy: Number of corrected bits according to active FEC (RS/FC). Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-19 23:20:01 +02:00
Gal Pressman	d8dc0508c5	net/mlx5: Add PPCNT physical layer statistical group infrastructure Add the needed infrastructure for future use of PPCNT physical layer statistical group. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2017-01-19 23:20:00 +02:00
Gal Pressman	71862561f3	net/mlx5: Query and cache PCAM, MCAM registers on initialization On load_one, we now cache our capabilities registers internally, similar to QUERY_HCA_CAP. Capabilities can later be queried using macros introduced in this patch. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-19 23:19:59 +02:00
Gal Pressman	c835ad6468	net/mlx5: Implement PCAM, MCAM access register commands Introduced registers will expose capabilities of new registers and features related to port/management. Driver will query MCAM and PCAM in order to avoid failing on old firmwares with lack of support. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-19 23:19:58 +02:00
Gal Pressman	cfdcbceaef	net/mlx5: Expose PCAM, MCAM registers infrastructure PCAM: Ports capabilities mask register. MCAM: Management capabilities mask register. PCAM and MCAM registers will provide information regarding firmware support for different features, in order to avoid cases where new driver combined with old firmware results in syndromes (for ex. PCIe counters before this patchset). Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-19 23:19:57 +02:00
Mohamad Haj Yahia	8a271746a2	net/mlx5e: Receive s-tagged packets in promiscuous mode Today when the driver enter to promiscuous mode or vlan filter is disabled, we add flow rule to receive any c-taggd packets, therefore s-tagged packets are dropped. In order to receive s-tagged packets as well we need to add flow rule to receive any s-tagged packet. Fixes: `7cb21b794b` ('net/mlx5e: Rename en_flow_table.c to en_fs.c') Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-19 23:19:56 +02:00
Mohamad Haj Yahia	105433659d	net/mlx5: Add support to s-tag in mlx5 firmware interface Add svlan_tag and rename vlan_tag to cvlan_tag in flow table entry match param. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2017-01-19 23:19:55 +02:00
Eugenia Emantayev	ee7f12205a	net/mlx5e: Implement 1PPS support This patch enables the 1PPS IN and 1PPS OUT support according to the advertised HCA capability. Single pin may be configured to one of the above mutual exclusive functions via standard Linux tools and APIs. For example, testptp open source application. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-19 23:19:54 +02:00
Eugenia Emantayev	f9a1ef720e	net/mlx5: Add MTPPS and MTPPSE registers infrastructure Implement query and set functionality for MTPPS and MTPPSE registers. MTPPS (Management Pulse Per Second) provides the device PPS capabilities, configures the PPS in and out modules and holds the PPS in time stamp. Query MTPPS is supported only when HCA_CAP.pps is set and modify is supported when HCA_CAP.pps_modify is set. MTPPSE (Management Pulse Per Second Event) configures the different event generation modes for PPS. Supported when HCA_CAP.pps is set. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-19 23:19:53 +02:00
Eli Cohen	712bfef609	net/mlx5: Fix version printout in case of health issue Firmware representation of the firmware version on the health buffer has changed for newer device. The representation in the initialization segment does not and will not change. In addition, we print the health buffer firmware version as a raw hex number. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-19 23:19:52 +02:00
Leon Romanovsky	f82eed4523	net/mlx5: Remove information print after attempt to load mlx5_ib module Infiniband part of mlx5 driver can be compiled as a module or as a part of bzImage (compiled in). In the second case, the call to request_module will return an error -ENOENT. It will cause to a misleading print "failed request module on mlx5_ib". This patch removes this print, In order to comply with mlx4. Fixes: `f66f049fb7` ("net/mlx5_core: Request the mlx5 IB module on driver load") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2017-01-19 23:19:51 +02:00
Volodymyr Bendiuga	4567d686f5	phy: increase size of MII_BUS_ID_SIZE and bus_id Some bus names are pretty long and do not fit into 17 chars. Increase therefore MII_BUS_ID_SIZE and phy_fixup.bus_id to larger number. Now mii_bus.id can host larger name. Signed-off-by: Volodymyr Bendiuga <volodymyr.bendiuga@gmail.com> Signed-off-by: Magnus Öberg <magnus.oberg@westermo.se> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-19 11:46:20 -05:00
Andrei.Pistirica@microchip.com	c2594d804d	macb: Common code to enable ptp support for MACB/GEM This patch does the following: - MACB/GEM-PTP interface - registers and bitfields for TSU - capability flags to enable PTP per platform basis Signed-off-by: Andrei Pistirica <andrei.pistirica@microchip.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-19 11:45:54 -05:00
Tobias Klauser	54c30f604d	net: caif: Remove unused stats member from struct chnl_net The stats member of struct chnl_net is used nowhere in the code, so it might as well be removed. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-19 11:45:21 -05:00
Arnd Bergmann	39b7b6a624	net/sched: cls_flower: reduce fl_change stack size The new ARP support has pushed the stack size over the edge on ARM, as there are two large objects on the stack in this function (mask and tb) and both have now grown a bit more: net/sched/cls_flower.c: In function 'fl_change': net/sched/cls_flower.c:928:1: error: the frame size of 1072 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] We can solve this by dynamically allocating one or both of them. I first tried to do it just for the mask, but that only saved 152 bytes on ARM, while this version just does it for the 'tb' array, bringing the stack size back down to 664 bytes. Fixes: `99d31326cb` ("net/sched: cls_flower: Support matching on ARP") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-19 11:18:53 -05:00
Tobias Klauser	4a7c972644	net: Remove usage of net_device last_rx member The network stack no longer uses the last_rx member of struct net_device since the bonding driver switched to use its own private last_rx in commit `9f24273837` ("bonding: use last_arp_rx in slave_last_rx()"). However, some drivers still (ab)use the field for their own purposes and some driver just update it without actually using it. Previously, there was an accompanying comment for the last_rx member added in commit `4dc89133f4` ("net: add a comment on netdev->last_rx") which asked drivers not to update is, unless really needed. However, this commend was removed in commit `f8ff080dac` ("bonding: remove useless updating of slave->dev->last_rx"), so some drivers added later on still did update last_rx. Remove all usage of last_rx and switch three drivers (sky2, atp and smc91c92_cs) which actually read and write it to use their own private copy in netdev_priv. Compile-tested with allyesconfig and allmodconfig on x86 and arm. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Mirko Lindner <mlindner@marvell.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 17:22:49 -05:00
Vivien Didelot	9520ed8fb8	net: dsa: use cpu_switch instead of ds[0] Now that the DSA Ethernet switches are true Linux devices, the CPU switch is not necessarily the first one. If its address is higher than the second switch on the same MDIO bus, its index will be 1, not 0. Avoid any confusion by using dst->cpu_switch instead of dst->ds[0]. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 16:49:47 -05:00
Vivien Didelot	b22de49086	net: dsa: store CPU switch structure in the tree Store a dsa_switch pointer to the CPU switch in the tree instead of only its index. This avoids the need to initialize it to -1. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 16:49:46 -05:00
Ivan Khoronzhuk	e33c2ef106	net: ethernet: ti: davinci_cpdma: correct check on NULL in set rate Check "ch" on NULL first, then get ctlr. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 16:46:01 -05:00
David S. Miller	e3e37e7017	Merge branch 'vhost_net-batching' Jason Wang says: ==================== vhost_net tx batching This series tries to implement tx batching support for vhost. This was done by using MSG_MORE as a hint for under layer socket. The backend (e.g tap) can then batch the packets temporarily in a list and submit it all once the number of bacthed exceeds a limitation. Tests shows obvious improvement on guest pktgen over over mlx4(noqueue) on host: Mpps -+% rx-frames = 0 0.91 +0% rx-frames = 4 1.00 +9.8% rx-frames = 8 1.00 +9.8% rx-frames = 16 1.01 +10.9% rx-frames = 32 1.07 +17.5% rx-frames = 48 1.07 +17.5% rx-frames = 64 1.08 +18.6% rx-frames = 64 (no MSG_MORE) 0.91 +0% Changes from V4: - stick to NAPI_POLL_WEIGHT for rx-frames is user specify a value greater than it. Changes from V3: - use ethtool instead of module parameter to control the maximum number of batched packets - avoid overhead when MSG_MORE were not set and no packet queued Changes from V2: - remove uselss queue limitation check (and we don't drop any packet now) Changes from V1: - drop NAPI handler since we don't use NAPI now - fix the issues that may exceeds max pending of zerocopy - more improvement on available buffer detection - move the limitation of batched pacekts from vhost to tuntap ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 16:35:30 -05:00
Jason Wang	5503fcecd4	tun: rx batching We can only process 1 packet at one time during sendmsg(). This often lead bad cache utilization under heavy load. So this patch tries to do some batching during rx before submitting them to host network stack. This is done through accepting MSG_MORE as a hint from sendmsg() caller, if it was set, batch the packet temporarily in a linked list and submit them all once MSG_MORE were cleared. Tests were done by pktgen (burst=128) in guest over mlx4(noqueue) on host: Mpps -+% rx-frames = 0 0.91 +0% rx-frames = 4 1.00 +9.8% rx-frames = 8 1.00 +9.8% rx-frames = 16 1.01 +10.9% rx-frames = 32 1.07 +17.5% rx-frames = 48 1.07 +17.5% rx-frames = 64 1.08 +18.6% rx-frames = 64 (no MSG_MORE) 0.91 +0% User were allowed to change per device batched packets through ethtool -C rx-frames. NAPI_POLL_WEIGHT were used as upper limitation to prevent bh from being disabled too long. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 16:35:30 -05:00
Jason Wang	0ed005ce02	vhost_net: tx batching This patch tries to utilize tuntap rx batching by peeking the tx virtqueue during transmission, if there's more available buffers in the virtqueue, set MSG_MORE flag for a hint for backend (e.g tuntap) to batch the packets. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 16:35:30 -05:00
Jason Wang	275bf960ac	vhost: better detection of available buffers This patch tries to do several tweaks on vhost_vq_avail_empty() for a better performance: - check cached avail index first which could avoid userspace memory access. - using unlikely() for the failure of userspace access - check vq->last_avail_idx instead of cached avail index as the last step. This patch is need for batching supports which needs to peek whether or not there's still available buffers in the ring. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 16:35:29 -05:00
Mao Wenan	1a8b6d76dc	net:add one common config ARCH_WANT_RELAX_ORDER to support relax ordering Relax ordering(RO) is one feature of 82599 NIC, to enable this feature can enhance the performance for some cpu architecure, such as SPARC and so on. Currently it only supports one special cpu architecture(SPARC) in 82599 driver to enable RO feature, this is not very common for other cpu architecture which really needs RO feature. This patch add one common config CONFIG_ARCH_WANT_RELAX_ORDER to set RO feature, and should define CONFIG_ARCH_WANT_RELAX_ORDER in sparc Kconfig firstly. Signed-off-by: Mao Wenan <maowenan@huawei.com> Reviewed-by: Alexander Duyck <alexander.duyck@gmail.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 16:33:00 -05:00
David S. Miller	1e48aac14c	Merge branch 'ipv6-simplify-rt6_fill_node' David Ahern says: ==================== net: ipv6: simplify rt6_fill_node Remove a couple of unnecessary input arguments to rt6_fill_node. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 15:44:00 -05:00
David Ahern	f8cfe2ceb1	net: ipv6: remove prefix arg to rt6_fill_node The prefix arg to rt6_fill_node is non-0 in only 1 path - rt6_dump_route where a user is requesting a prefix only dump. Simplify rt6_fill_node by removing the prefix arg and moving the prefix check to rt6_dump_route. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 15:43:59 -05:00
David Ahern	fd61c6ba31	net: ipv6: remove nowait arg to rt6_fill_node All callers of rt6_fill_node pass 0 for nowait arg. Remove the arg and simplify rt6_fill_node accordingly. rt6_fill_node passes the nowait of 0 to ip6mr_get_route. Remove the nowait arg from it as well. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 15:43:59 -05:00
David S. Miller	1ce463dd75	Merge branch 'sctp-sender-side-stream-reconf-ssn-reset-request-chunk' Xin Long says: ==================== sctp: add sender-side procedures for stream reconf ssn reset request chunk Patch 6/6 is to implement sender-side procedures for the Outgoing and Incoming SSN Reset Request Parameter described in rfc6525 section 5.1.2 and 5.1.3 Patches 1-5/6 are ahead of it to define some apis and asoc members for it. Note that with this patchset, asoc->reconf_enable has no chance yet to be set, until the patch "sctp: add get and set sockopt for reconf_enable" is applied in the future. As we can not just enable it when sctp is not capable of processing reconf chunk yet. v1->v2: - put these into a smaller group. - rename some temporary variables in the codes. - rename the titles of the commits and improve some changelogs. v2->v3: - re-split the patchset and make sure it has no dead codes for review. v3->v4: - move sctp_make_reconf() into patch 1/6 to avoid kbuild warning. - drop unused struct sctp_strreset_req. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 14:55:11 -05:00
Xin Long	7f9d68ac94	sctp: implement sender-side procedures for SSN Reset Request Parameter This patch is to implement sender-side procedures for the Outgoing and Incoming SSN Reset Request Parameter described in rfc6525 section 5.1.2 and 5.1.3. It is also add sockopt SCTP_RESET_STREAMS in rfc6525 section 6.3.2 for users. Note that the new asoc member strreset_outstanding is to make sure only one reconf request chunk on the fly as rfc6525 section 5.1.1 demands. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 14:55:11 -05:00
Xin Long	9fb657aec0	sctp: add sockopt SCTP_ENABLE_STREAM_RESET This patch is to add sockopt SCTP_ENABLE_STREAM_RESET to get/set strreset_enable to indicate which reconf request type it supports, which is described in rfc6525 section 6.3.1. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 14:55:10 -05:00
Xin Long	c28445c3cb	sctp: add reconf_enable in asoc ep and netns This patch is to add reconf_enable field in all of asoc ep and netns to indicate if they support stream reset. When initializing, asoc reconf_enable get the default value from ep reconf_enable which is from netns netns reconf_enable by default. It is also to add reconf_capable in asoc peer part to know if peer supports reconf_enable, the value is set if ext params have reconf chunk support when processing init chunk, just as rfc6525 section 5.1.1 demands. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 14:55:10 -05:00

1 2 3 4 5 ...

649339 Commits