linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-18 09:46:50 +07:00

Author	SHA1	Message	Date
Julian Wiedmann	40e6a22584	s390/qeth: remove qeth_get_elements_no() Convert the last remaining user of qeth_get_elements_no() to qeth_count_elements(), so this helper can be removed. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 09:10:25 -07:00
Julian Wiedmann	0a6da4b10d	s390/qeth: remove unused L3 xmit code qeth_l3_xmit() is now only used for TSOv4 traffic, shrink it down. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 09:10:25 -07:00
Julian Wiedmann	f13ade1993	s390/qeth: run non-offload L3 traffic over common xmit path L3 OSAs can only offload IPv4 traffic, use the common L2 transmit path for all other traffic. In particular there's no support for TX VLAN offload, so any such packet needs to be manually de-accelerated via ndo_features_check(). Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 09:10:24 -07:00
Julian Wiedmann	fc69660bbd	s390/qeth: move L2 xmit code to core module We need the exact same transmit path for non-offload-eligible traffic on L3 OSAs. So make it accessible from both sub-drivers. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 09:10:24 -07:00
Weilin Chang	75b2c206bb	liquidio: Add the features to show FEC settings and set FEC settings 1. Add functions for get_fecparam and set_fecparam. 2. Modify lio_get_link_ksettings to display FEC setting. Signed-off-by: Weilin Chang <weilin.chang@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:22:26 -07:00
zhong jiang	a4ebec033e	net: ethernet: remove redundant null pointer check before of_node_put of_node_put has taken the null pointer check into account. So it is safe to remove the duplicated check before of_node_put. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Vladimir Zapolskiy <vz@mleia.com> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:21:45 -07:00
zhong jiang	b458925ed5	net: dsa: remove redundant null pointer check before put_device put_device has taken the null pinter check into account. So it is safe to remove the duplicated check before put_device. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:20:11 -07:00
zhong jiang	1ddc5d3e5f	net: dsa: remove redundant null pointer check before of_node_put of_node_put has taken the null pointer check into account. So it is safe to remove the duplicated check before of_node_put. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:20:11 -07:00
zhong jiang	764ea3714a	net: usb: remove redundant null pointer check before of_node_put of_node_put has taken the null pointer check into account. So it is safe to remove the duplicated check before of_node_put. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:20:11 -07:00
Zhu Yanjun	1635bb548f	net: rds: use memset to optimize the recv The function rds_inc_init is in recv process. To use memset can optimize the function rds_inc_init. The test result: Before: 1) + 24.950 us \| rds_inc_init [rds](); After: 1) + 10.990 us \| rds_inc_init [rds](); Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:19:51 -07:00
Vakul Garg	0185e2e69f	selftests/tls: Add MSG_WAITALL in recv() syscall A number of tls selftests rely upon recv() to return an exact number of data bytes. When tls record crypto is done using an async accelerator, it is possible that recv() returns lesser than expected number bytes. This leads to failure of many test cases. To fix it, MSG_WAITALL has been used in flags passed to recv() syscall. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:15:03 -07:00
Friedemann Gerold	6f9dbadc1a	net: aquantia: memory corruption on jumbo frames This patch fixes skb_shared area, which will be corrupted upon reception of 4K jumbo packets. Originally build_skb usage purpose was to reuse page for skb to eliminate needs of extra fragments. But that logic does not take into account that skb_shared_info should be reserved at the end of skb data area. In case packet data consumes all the page (4K), skb_shinfo location overflows the page. As a consequence, __build_skb zeroed shinfo data above the allocated page, corrupting next page. The issue is rarely seen in real life because jumbo are normally larger than 4K and that causes another code path to trigger. But it 100% reproducible with simple scapy packet, like: sendp(IP(dst="192.168.100.3") / TCP(dport=443) \ / Raw(RandString(size=(4096-40))), iface="enp1s0") Fixes: `018423e90b` ("net: ethernet: aquantia: Add ring support code") Reported-by: Friedemann Gerold <f.gerold@b-c-s.de> Reported-by: Michael Rauch <michael@rauch.be> Signed-off-by: Friedemann Gerold <f.gerold@b-c-s.de> Tested-by: Nikita Danilov <nikita.danilov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:14:14 -07:00
David S. Miller	d10f7e1d9e	Merge branch 'lantiq-Minor-fixes-for-vrx200-and-gswip' Hauke Mehrtens says: ==================== net: lantiq: Minor fixes for vrx200 and gswip These are mostly minor fixes to problems addresses in the latests round of the review of the original series adding these driver, which were not applied before the patches got merged into net-next. In addition it fixes a data bus error on poweroff. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:12:12 -07:00
Hauke Mehrtens	711ddb625c	net: dsa: tag_gswip: Add gswip to dsa_tag_protocol_to_str() The gswip tag was missing in the dsa_tag_protocol_to_str() function, add it. Fixes: `7969119293` ("net: dsa: Add Lantiq / Intel GSWIP tag support") Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:12:11 -07:00
Hauke Mehrtens	0e630b598e	net: dsa: lantiq_gswip: Minor code style improvements Use one code block when returning because the interface type is unsupported and also check if some unsupported port gets configured. In addition fix a double the and use dsa_is_cpu_port() instated of manually getting the CPU port. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:12:11 -07:00
Hauke Mehrtens	a44ecfbda4	net: lantiq: lantiq_xrx200: Move clock prepare to probe function The switch and the MAC are in one IP core and they use the same clock signal from the clock generation unit. Currently the clock architecture in the lantiq SoC code does not allow to easily share the same clocks, this has to be fixed by switching to the common clock framework. As a workaround the clock of the switch and MAC should be activated when the MAC gets probed and only disabled when the MAC gets removed. This way it is ensured that the clock is always enabled when the switch or MAC is used. The switch can not be used without the MAC. This fixes a data bus error when rebooting the system and deactivating the switch and mac and later accessing some registers in the cleanup while the clocks are disabled. Fixes: `fe1a56420c` ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver") Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:12:11 -07:00
Hauke Mehrtens	e82b5fe5d4	dt-bindings: net: dsa: lantiq, xrx200-gswip: Fix minor style fixes * Use one compatible line per line in the documentation * Remove SoC revision depended compatible lines, we can detect that in the driver * Use lower case letters in hex addresses * Fix the size of the address ranges in the example, this now matches the sizes used by the SoC. The old ones will also work, this just adds some empty address space. * Change the reg size of the gphy-fw node Fixes: `86ce2bc73c` ("dt-bindings: net: dsa: Add lantiq, xrx200-gswip DT bindings") Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Cc: devicetree@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:12:11 -07:00
Hauke Mehrtens	d52030e6d5	dt-bindings: net: lantiq, xrx200-net: Use lower case in hex Use lower case letters in the addresses of the device tree binding. In addition replace eth with ethernet and fix the size of the reg element in the example. The additional range does not contain any registers but is used for the IP block on the this SoC. Fixes: `839790e88a` ("dt-bindings: net: Add lantiq, xrx200-net DT bindings") Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Cc: devicetree@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:12:11 -07:00
Wei Yongjun	0a959e4584	net: hns: make function hns_gmac_wait_fifo_clean() static Fixes the following sparse warning: drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c:322:5: warning: symbol 'hns_gmac_wait_fifo_clean' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:07:04 -07:00
Wei Yongjun	b8b2de91e9	net: lantiq: Fix return value check in xrx200_probe() In case of error, the function devm_ioremap_resource() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Fixes: `fe1a56420c` ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:06:41 -07:00
Wei Yongjun	f592e0b989	net: dsa: gswip: Fix copy-paste error in gswip_gphy_fw_probe() The return value from of_reset_control_array_get_exclusive() is not checked correctly. The test is done against a wrong variable. This patch fix it. Fixes: `14fceff477` ("net: dsa: Add Lantiq / Intel DSA driver for vrx200") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:06:19 -07:00
Wei Yongjun	f5de8bfef8	net: dsa: gswip: Fix return value check in gswip_probe() In case of error, the function devm_ioremap_resource() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Fixes: `14fceff477` ("net: dsa: Add Lantiq / Intel DSA driver for vrx200") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:05:58 -07:00
John Fastabend	7a3dd8c897	tls: async support causes out-of-bounds access in crypto APIs When async support was added it needed to access the sk from the async callback to report errors up the stack. The patch tried to use space after the aead request struct by directly setting the reqsize field in aead_request. This is an internal field that should not be used outside the crypto APIs. It is used by the crypto code to define extra space for private structures used in the crypto context. Users of the API then use crypto_aead_reqsize() and add the returned amount of bytes to the end of the request memory allocation before posting the request to encrypt/decrypt APIs. So this breaks (with general protection fault and KASAN error, if enabled) because the request sent to decrypt is shorter than required causing the crypto API out-of-bounds errors. Also it seems unlikely the sk is even valid by the time it gets to the callback because of memset in crypto layer. Anyways, fix this by holding the sk in the skb->sk field when the callback is set up and because the skb is already passed through to the callback handler via void* we can access it in the handler. Then in the handler we need to be careful to NULL the pointer again before kfree_skb. I added comments on both the setup (in tls_do_decryption) and when we clear it from the crypto callback handler tls_decrypt_done(). After this selftests pass again and fixes KASAN errors/warnings. Fixes: `94524d8fc9` ("net/tls: Add support for async decryption of tls records") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Vakul Garg <Vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:01:36 -07:00
Haishuang Yan	a82738adff	ip6_gre: simplify gre header parsing in ip6gre_err Same as ip_gre, use gre_parse_header to parse gre header in gre error handler code. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-16 15:32:59 -07:00
Haishuang Yan	b0350d51f0	ip_gre: fix parsing gre header in ipgre_err gre_parse_header stops parsing when csum_err is encountered, which means tpi->key is undefined and ip_tunnel_lookup will return NULL improperly. This patch introduce a NULL pointer as csum_err parameter. Even when csum_err is encountered, it won't return error and continue parsing gre header as expected. Fixes: `9f57c67c37` ("gre: Remove support for sharing GRE protocol hook.") Reported-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-16 15:32:59 -07:00
Florian Fainelli	21e65923ab	net: phy: et011c: Remove incorrect PHY_POLL flags PHY_POLL is defined as -1 which means that we would be setting all flags of the PHY driver, this is also not a valid flag to tell PHYLIB about, just remove it. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-16 15:31:01 -07:00
David S. Miller	50676de486	Merge branch 'act_police-lockless-data-path' Davide Caratti says: ==================== net/sched: act_police: lockless data path the data path of 'police' action can be faster if we avoid using spinlocks: - patch 1 converts act_police to use per-cpu counters - patch 2 lets act_police use RCU to access its configuration data. test procedure (using pktgen from https://github.com/netoptimizer): # ip link add name eth1 type dummy # ip link set dev eth1 up # tc qdisc add dev eth1 clsact # tc filter add dev eth1 egress matchall action police \ > rate 2gbit burst 100k conform-exceed pass/pass index 100 # for c in 1 2 4; do > ./pktgen_bench_xmit_mode_queue_xmit.sh -v -s 64 -t $c -n 5000000 -i eth1 > done test results (avg. pps/thread): $c \| before patch \| after patch \| improvement ----+--------------+--------------+------------- 1 \| 3518448 \| 3591240 \| irrelevant 2 \| 3070065 \| 3383393 \| 10% 4 \| 1540969 \| 3238385 \| 110% ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-16 15:30:23 -07:00
Davide Caratti	2d550dbad8	net/sched: act_police: don't use spinlock in the data path use RCU instead of spinlocks, to protect concurrent read/write on act_police configuration. This reduces the effects of contention in the data path, in case multiple readers are present. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-16 15:30:22 -07:00
Davide Caratti	93be42f917	net/sched: act_police: use per-cpu counters use per-CPU counters, instead of sharing a single set of stats with all cores. This removes the need of using spinlock when statistics are read or updated. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-16 15:30:22 -07:00
Ganesh Goudar	c3ec8bcceb	cxgb4: update supported DCB version - In CXGB4_DCB_STATE_FW_INCOMPLETE state check if the dcb version is changed and update the dcb supported version. - Also, fill the priority code point value for priority based flow control. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-14 08:50:23 -07:00
Ganesh Goudar	992bea8e40	cxgb4: add per rx-queue counter for packet errors print per rx-queue packet errors in sge_qinfo Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-14 08:40:53 -07:00
Ganesh Goudar	0dc235afc5	cxgb4: Fix endianness issue in t4_fwcache() Do not put host-endian 0 or 1 into big endian feild. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-14 08:40:53 -07:00
Li RongQing	52bb6677d5	net: move definition of pcpu_lstats to header file pcpu_lstats is defined in several files, so unify them as one and move to header file Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-14 08:32:23 -07:00
Kees Cook	ee4fccbee7	net/ibm/emac: Remove VLA usage In the quest to remove all stack VLA usage from the kernel[1], this removes the VLA used for the emac xaht registers size. Since the size of registers can only ever be 4 or 8, as detected in emac_init_config(), the max can be hardcoded and a runtime test added for robustness. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Cc: "David S. Miller" <davem@davemloft.net> Cc: Christian Lamparter <chunkeey@gmail.com> Cc: Ivan Mikhaylov <ivan@de.ibm.com> Cc: netdev@vger.kernel.org Co-developed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 16:53:24 -07:00
Gustavo A. R. Silva	f91845da9f	pktgen: Fix fall-through annotation Replace "fallthru" with a proper "fall through" annotation. This fix is part of the ongoing efforts to enabling -Wimplicit-fallthrough Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 15:36:41 -07:00
Gustavo A. R. Silva	310fc0513e	tg3: Fix fall-through annotations Replace "fallthru" with a proper "fall through" annotation. This fix is part of the ongoing efforts to enabling -Wimplicit-fallthrough Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 15:36:41 -07:00
Toke Høiland-Jørgensen	50c12f7401	gso_segment: Reset skb->mac_len after modifying network header When splitting a GSO segment that consists of encapsulated packets, the skb->mac_len of the segments can end up being set wrong, causing packet drops in particular when using act_mirred and ifb interfaces in combination with a qdisc that splits GSO packets. This happens because at the time skb_segment() is called, network_header will point to the inner header, throwing off the calculation in skb_reset_mac_len(). The network_header is subsequently adjust by the outer IP gso_segment handlers, but they don't set the mac_len. Fix this by adding skb_reset_mac_len() calls to both the IPv4 and IPv6 gso_segment handlers, after they modify the network_header. Many thanks to Eric Dumazet for his help in identifying the cause of the bug. Acked-by: Dave Taht <dave.taht@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 12:08:40 -07:00
YueHaibing	293681f149	vxlan: Remove duplicated include from vxlan.h Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 12:07:56 -07:00
Florian Fainelli	b2ddc48a81	net: dsa: b53: Do not fail when IRQ are not initialized When the Device Tree is not providing the per-port interrupts, do not fail during b53_srab_irq_enable() but instead bail out gracefully. The SRAB driver is used on the BCM5301X (Northstar) platforms which do not yet have the SRAB interrupts wired up. Fixes: `16994374a6` ("net: dsa: b53: Make SRAB driver manage port interrupts") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 10:19:14 -07:00
David S. Miller	8bb83b7838	Merge branch 'vhost_net-TX-batching' Jason Wang says: ==================== vhost_net TX batching This series tries to batch submitting packets to underlayer socket through msg_control during sendmsg(). This is done by: 1) Doing userspace copy inside vhost_net 2) Build XDP buff 3) Batch at most 64 (VHOST_NET_BATCH) XDP buffs and submit them once through msg_control during sendmsg(). 4) Underlayer sockets can use XDP buffs directly when XDP is enalbed, or build skb based on XDP buff. For the packet that can not be built easily with XDP or for the case that batch submission is hard (e.g sndbuf is limited). We will go for the previous slow path, passing iov iterator to underlayer socket through sendmsg() once per packet. This can help to improve cache utilization and avoid lots of indirect calls with sendmsg(). It can also co-operate with the batching support of the underlayer sockets (e.g the case of XDP redirection through maps). Testpmd(txonly) in guest shows obvious improvements: Test /+pps% XDP_DROP on TAP /+44.8% XDP_REDIRECT on TAP /+29% macvtap (skb) /+26% Netperf TCP_STREAM TX from guest shows obvious improvements on small packet: size/session/+thu%/+normalize% 64/ 1/ +2%/ 0% 64/ 2/ +3%/ +1% 64/ 4/ +7%/ +5% 64/ 8/ +8%/ +6% 256/ 1/ +3%/ 0% 256/ 2/ +10%/ +7% 256/ 4/ +26%/ +22% 256/ 8/ +27%/ +23% 512/ 1/ +3%/ +2% 512/ 2/ +19%/ +14% 512/ 4/ +43%/ +40% 512/ 8/ +45%/ +41% 1024/ 1/ +4%/ 0% 1024/ 2/ +27%/ +21% 1024/ 4/ +38%/ +73% 1024/ 8/ +15%/ +24% 2048/ 1/ +10%/ +7% 2048/ 2/ +16%/ +12% 2048/ 4/ 0%/ +2% 2048/ 8/ 0%/ +2% 4096/ 1/ +36%/ +60% 4096/ 2/ -11%/ -26% 4096/ 4/ 0%/ +14% 4096/ 8/ 0%/ +4% 16384/ 1/ -1%/ +5% 16384/ 2/ 0%/ +2% 16384/ 4/ 0%/ -3% 16384/ 8/ 0%/ +4% 65535/ 1/ 0%/ +10% 65535/ 2/ 0%/ +8% 65535/ 4/ 0%/ +1% 65535/ 8/ 0%/ +3% Please review. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 09:25:41 -07:00
Jason Wang	0a0be13b8f	vhost_net: batch submitting XDP buffers to underlayer sockets This patch implements XDP batching for vhost_net. The idea is first to try to do userspace copy and build XDP buff directly in vhost. Instead of submitting the packet immediately, vhost_net will batch them in an array and submit every 64 (VHOST_NET_BATCH) packets to the under layer sockets through msg_control of sendmsg(). When XDP is enabled on the TUN/TAP, TUN/TAP can process XDP inside a loop without caring GUP thus it can do batch map flushing. When XDP is not enabled or not supported, the underlayer socket need to build skb and pass it to network core. The batched packet submission allows us to do batching like netif_receive_skb_list() in the future. This saves lots of indirect calls for better cache utilization. For the case that we can't so batching e.g when sndbuf is limited or packet size is too large, we will go for usual one packet per sendmsg() way. Doing testpmd on various setups gives us: Test /+pps% XDP_DROP on TAP /+44.8% XDP_REDIRECT on TAP /+29% macvtap (skb) /+26% Netperf tests shows obvious improvements for small packet transmission: size/session/+thu%/+normalize% 64/ 1/ +2%/ 0% 64/ 2/ +3%/ +1% 64/ 4/ +7%/ +5% 64/ 8/ +8%/ +6% 256/ 1/ +3%/ 0% 256/ 2/ +10%/ +7% 256/ 4/ +26%/ +22% 256/ 8/ +27%/ +23% 512/ 1/ +3%/ +2% 512/ 2/ +19%/ +14% 512/ 4/ +43%/ +40% 512/ 8/ +45%/ +41% 1024/ 1/ +4%/ 0% 1024/ 2/ +27%/ +21% 1024/ 4/ +38%/ +73% 1024/ 8/ +15%/ +24% 2048/ 1/ +10%/ +7% 2048/ 2/ +16%/ +12% 2048/ 4/ 0%/ +2% 2048/ 8/ 0%/ +2% 4096/ 1/ +36%/ +60% 4096/ 2/ -11%/ -26% 4096/ 4/ 0%/ +14% 4096/ 8/ 0%/ +4% 16384/ 1/ -1%/ +5% 16384/ 2/ 0%/ +2% 16384/ 4/ 0%/ -3% 16384/ 8/ 0%/ +4% 65535/ 1/ 0%/ +10% 65535/ 2/ 0%/ +8% 65535/ 4/ 0%/ +1% 65535/ 8/ 0%/ +3% Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 09:25:41 -07:00
Jason Wang	0efac27791	tap: accept an array of XDP buffs through sendmsg() This patch implement TUN_MSG_PTR msg_control type. This type allows the caller to pass an array of XDP buffs to tuntap through ptr field of the tun_msg_control. Tap will build skb through those XDP buffers. This will avoid lots of indirect calls thus improves the icache utilization and allows to do XDP batched flushing when doing XDP redirection. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 09:25:41 -07:00
Jason Wang	043d222f93	tuntap: accept an array of XDP buffs through sendmsg() This patch implement TUN_MSG_PTR msg_control type. This type allows the caller to pass an array of XDP buffs to tuntap through ptr field of the tun_msg_control. If an XDP program is attached, tuntap can run XDP program directly. If not, tuntap will build skb and do a fast receiving since part of the work has been done by vhost_net. This will avoid lots of indirect calls thus improves the icache utilization and allows to do XDP batched flushing when doing XDP redirection. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 09:25:40 -07:00
Jason Wang	fe8dd45bb7	tun: switch to new type of msg_control This patch introduces to a new tun/tap specific msg_control: #define TUN_MSG_UBUF 1 #define TUN_MSG_PTR 2 struct tun_msg_ctl { int type; void *ptr; }; This allows us to pass different kinds of msg_control through sendmsg(). The first supported type is ubuf (TUN_MSG_UBUF) which will be used by the existed vhost_net zerocopy code. The second is XDP buff, which allows vhost_net to pass XDP buff to TUN. This could be used to implement accepting an array of XDP buffs from vhost_net in the following patches. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 09:25:40 -07:00
Jason Wang	1a097910ad	tuntap: move XDP flushing out of tun_do_xdp() This will allow adding batch flushing on top. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 09:25:40 -07:00
Jason Wang	8ae1aff0b3	tuntap: split out XDP logic This patch split out XDP logic into a single function. This make it to be reused by XDP batching path in the following patch. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 09:25:40 -07:00
Jason Wang	ac1f1f6c5a	tuntap: tweak on the path of skb XDP case in tun_build_skb() If we're sure not to go native XDP, there's no need for several things like bh and rcu stuffs. So this patch introduces a helper to build skb and hold page refcnt. When we found we will go through skb path, build skb directly. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 09:25:40 -07:00
Jason Wang	f7053b6ccb	tuntap: simplify error handling in tun_build_skb() There's no need to duplicate page get logic in each action. So this patch tries to get page and calculate the offset before processing XDP actions (except for XDP_DROP), and undo them when meet errors (we don't care the performance on errors). This will be used for factoring out XDP logic. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 09:25:40 -07:00
Jason Wang	291aeb2b1d	tuntap: enable bh early during processing XDP This patch move the bh enabling a little bit earlier, this will be used for factoring out the core XDP logic of tuntap. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 09:25:40 -07:00
Jason Wang	4f23aff871	tuntap: switch to use XDP_PACKET_HEADROOM Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 09:25:40 -07:00

1 2 3 4 5 ...

782715 Commits