linux_dsm_epyc7002/include/linux/mlx5
Achiad Shochat 88a85f99e5 net/mlx5e: TX latency optimization to save DMA reads
A regular TX WQE execution involves two or more DMA reads -
one to fetch the WQE, and another one per WQE gather entry.

These DMA reads obviously increase the TX latency.
There are two mlx5 mechanisms to bypass these DMA reads:
1) Inline WQE
2) Blue Flame (BF)

An inline WQE contains a whole packet, thus saves the DMA read/s
of the regular WQE gather entry/s. Inline WQE support was already
added in the previous commit.

A BF WQE is written directly to the device I/O mapped memory, thus
enables saving the DMA read that fetches the WQE.

The BF WQE I/O write must be in cache line granularity, thus uses
the CPU write combining mechanism.
A BF WQE I/O write acts also as a TX doorbell for notifying the
device of new TX WQEs.
A BF WQE is written to the same I/O mapped address as the regular TX
doorbell, thus this address is being mapped twice - once by ioremap()
and once by io_mapping_map_wc().

While both mechanisms reduce the TX latency, they both consume more CPU
cycles than a regular WQE:
- A BF WQE must still be written to host memory, in addition to being
  written directly to the device I/O mapped memory.
- An inline WQE involves copying the SKB data into it.

To handle this tradeoff, we introduce here a heuristic algorithm that
strives to avoid using these two mechanisms in case the TX queue is
being back-pressured by the device, and limit their usage rate otherwise.

An inline WQE will always be "Blue Flamed" (written directly to the
device I/O mapped memory) while a BF WQE may not be inlined (may contain
gather entries).

Preliminary testing using netperf UDP_RR shows that the latency goes down
from 17.5us to 16.9us, while the message rate (tested with pktgen) stays
the same.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 00:29:17 -07:00
..
cmd.h net/mlx5_core: Fix Mellanox copyright note 2015-04-02 16:33:42 -04:00
cq.h net/mlx5_core: Modify CQ moderation parameters 2015-05-30 18:23:59 -07:00
device.h net/mlx5e: Add HW cacheline start padding 2015-06-11 15:55:25 -07:00
doorbell.h net/mlx5_core: Fix Mellanox copyright note 2015-04-02 16:33:42 -04:00
driver.h net/mlx5e: TX latency optimization to save DMA reads 2015-07-27 00:29:17 -07:00
flow_table.h net/mlx5: Ethernet resource handling files 2015-05-30 18:24:39 -07:00
mlx5_ifc.h net/mlx5e: Support ETH_RSS_HASH_XOR 2015-07-27 00:29:16 -07:00
qp.h net/mlx5_core: HW data structs/types definitions cleanup 2015-05-30 18:23:11 -07:00
srq.h net/mlx5_core: Fix Mellanox copyright note 2015-04-02 16:33:42 -04:00
vport.h net/mlx5_core: Fix static checker warnings around system guid query flow 2015-06-07 20:11:17 -07:00