linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-20 14:39:21 +07:00

History

Tariq Toukan 461017cb00 net/mlx5e: Support RX multi-packet WQE (Striding RQ) Introduce the feature of multi-packet WQE (RX Work Queue Element) referred to as (MPWQE or Striding RQ), in which WQEs are larger and serve multiple packets each. Every WQE consists of many strides of the same size, every received packet is aligned to a beginning of a stride and is written to consecutive strides within a WQE. In the regular approach, each regular WQE is big enough to be capable of serving one received packet of any size up to MTU or 64K in case of device LRO is enabled, making it very wasteful when dealing with small packets or device LRO is enabled. For its flexibility, MPWQE allows a better memory utilization (implying improvements in CPU utilization and packet rate) as packets consume strides according to their size, preserving the rest of the WQE to be available for other packets. MPWQE default configuration: Num of WQEs = 16 Strides Per WQE = 2048 Stride Size = 64 byte The default WQEs memory footprint went from 1024mtu (~1.5MB) to 16 2048 * 64 = 2MB per ring. However, HW LRO can now be supported at no additional cost in memory footprint, and hence we turn it on by default and get an even better performance. Performance tested on ConnectX4-Lx 50G. To isolate the feature under test, the numbers below were measured with HW LRO turned off. We verified that the performance just improves when LRO is turned back on. * Netperf single TCP stream: - BW raised by 10-15% for representative packet sizes: default, 64B, 1024B, 1478B, 65536B. * Netperf multi TCP stream: - No degradation, line rate reached. * Pktgen: packet rate raised by 2-10% for traffic of different message sizes: 64B, 128B, 256B, 1024B, and 1500B. * Pktgen: packet loss in bursts of small messages (64byte), single stream: - \| num packets \| packets loss before \| packets loss after \| 2K \| ~ 1K \| 0 \| 8K \| ~ 6K \| 0 \| 16K \| ~13K \| 0 \| 32K \| ~28K \| 0 \| 64K \| ~57K \| ~24K As expected as the driver can receive as many small packets (<=64B) as the number of total strides in the ring (default = 2048 * 16) vs. 1024 (default ring size regardless of packets size) before this feature. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>		2016-04-21 15:09:05 -04:00
..
cmd.h	net/mlx5_core: Fix Mellanox copyright note	2015-04-02 16:33:42 -04:00
cq.h	net/mlx5_core: Fix trimming down IRQ number	2016-01-17 12:08:04 -05:00
device.h	net/mlx5e: Support RX multi-packet WQE (Striding RQ)	2016-04-21 15:09:05 -04:00
doorbell.h	net/mlx5_core: Fix Mellanox copyright note	2015-04-02 16:33:42 -04:00
driver.h	Round two of 4.6 merge window patches	2016-03-22 15:48:44 -07:00
fs.h	net/mlx5_core: Introduce forward to next priority action	2016-03-10 09:22:06 -05:00
mlx5_ifc.h	net/mlx5: Update mlx5_ifc hardware features	2016-04-15 17:21:10 -04:00
port.h	net/mlx5e: Wake On LAN support	2016-02-24 13:50:21 -05:00
qp.h	net/mlx5: Introduce device queue counters	2016-04-21 15:09:04 -04:00
srq.h	net/mlx5_core: Fix Mellanox copyright note	2015-04-02 16:33:42 -04:00
transobj.h	IB/mlx5: Support setting Ethernet priority for Raw Packet QPs	2016-01-21 12:01:09 -05:00
vport.h	net/mlx5_core: Implement modify HCA vport command	2016-03-21 17:13:14 -04:00