linux_dsm_epyc7002/drivers
Sivakumar Krishnasamy 66aa0678ef ibmveth: Support to enable LSO/CSO for Trunk VEA.
Current largesend and checksum offload feature in ibmveth driver,
 - Source VM sends the TCP packets with ip_summed field set as
   CHECKSUM_PARTIAL and TCP pseudo header checksum is placed in
   checksum field
 - CHECKSUM_PARTIAL flag in SKB will enable ibmveth driver to mark
   "no checksum" and "checksum good" bits in transmit buffer descriptor
   before the packet is delivered to pseries PowerVM Hypervisor
 - If ibmveth has largesend capability enabled, transmit buffer descriptors
   are market accordingly before packet is delivered to Hypervisor
   (along with mss value for packets with length > MSS)
 - Destination VM's ibmveth driver receives the packet with "checksum good"
   bit set and so, SKB's ip_summed field is set with CHECKSUM_UNNECESSARY
 - If "largesend" bit was on, mss value is copied from receive descriptor
   into SKB's gso_size and other flags are appropriately set for
   packets > MSS size
 - The packet is now successfully delivered up the stack in destination VM

The offloads described above works fine for TCP communication among VMs in
the same pseries server ( VM A <=> PowerVM Hypervisor <=> VM B )

We are now enabling support for OVS in pseries PowerVM environment. One of
our requirements is to have ibmveth driver configured in "Trunk" mode, when
they are used with OVS. This is because, PowerVM Hypervisor will no more
bridge the packets between VMs, instead the packets are delivered to
IO Server which hosts OVS to bridge them between VMs or to external
networks (flow shown below),
  VM A <=> PowerVM Hypervisor <=> IO Server(OVS) <=> PowerVM Hypervisor
                                                                   <=> VM B
In "IO server" the packet is received by inbound Trunk ibmveth and then
delivered to OVS, which is then bridged to outbound Trunk ibmveth (shown
below),
        Inbound Trunk ibmveth <=> OVS <=> Outbound Trunk ibmveth

In this model, we hit the following issues which impacted the VM
communication performance,

 - Issue 1: ibmveth doesn't support largesend and checksum offload features
   when configured as "Trunk". Driver has explicit checks to prevent
   enabling these offloads.

 - Issue 2: SYN packet drops seen at destination VM. When the packet
   originates, it has CHECKSUM_PARTIAL flag set and as it gets delivered to
   IO server's inbound Trunk ibmveth, on validating "checksum good" bits
   in ibmveth receive routine, SKB's ip_summed field is set with
   CHECKSUM_UNNECESSARY flag. This packet is then bridged by OVS (or Linux
   Bridge) and delivered to outbound Trunk ibmveth. At this point the
   outbound ibmveth transmit routine will not set "no checksum" and
   "checksum good" bits in transmit buffer descriptor, as it does so only
   when the ip_summed field is CHECKSUM_PARTIAL. When this packet gets
   delivered to destination VM, TCP layer receives the packet with checksum
   value of 0 and with no checksum related flags in ip_summed field. This
   leads to packet drops. So, TCP connections never goes through fine.

 - Issue 3: First packet of a TCP connection will be dropped, if there is
   no OVS flow cached in datapath. OVS while trying to identify the flow,
   computes the checksum. The computed checksum will be invalid at the
   receiving end, as ibmveth transmit routine zeroes out the pseudo
   checksum value in the packet. This leads to packet drop.

 - Issue 4: ibmveth driver doesn't have support for SKB's with frag_list.
   When Physical NIC has GRO enabled and when OVS bridges these packets,
   OVS vport send code will end up calling dev_queue_xmit, which in turn
   calls validate_xmit_skb.
   In validate_xmit_skb routine, the larger packets will get segmented into
   MSS sized segments, if SKB has a frag_list and if the driver to which
   they are delivered to doesn't support NETIF_F_FRAGLIST feature.

This patch addresses the above four issues, thereby enabling end to end
largesend and checksum offload support for better performance.

 - Fix for Issue 1 : Remove checks which prevent enabling TCP largesend and
   checksum offloads.
 - Fix for Issue 2 : When ibmveth receives a packet with "checksum good"
   bit set and if its configured in Trunk mode, set appropriate SKB fields
   using skb_partial_csum_set (ip_summed field is set with
   CHECKSUM_PARTIAL)
 - Fix for Issue 3: Recompute the pseudo header checksum before sending the
   SKB up the stack.
 - Fix for Issue 4: Linearize the SKBs with frag_list. Though we end up
   allocating buffers and copying data, this fix gives
   upto 4X throughput increase.

Note: All these fixes need to be dropped together as fixing just one of
them will lead to other issues immediately (especially for Issues 1,2 & 3).

Signed-off-by: Sivakumar Krishnasamy <ksiva@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21 13:29:01 -04:00
..
accessibility
acpi More ACPI updates for v4.12-rc1 2017-05-10 09:35:42 -07:00
amba
android
ata ARM: SoC driver updates 2017-05-09 10:01:15 -07:00
atm
auxdisplay
base More power management updates for v4.12-rc1 2017-05-10 09:12:30 -07:00
bcma
block virtio: fixes, cleanups, performance 2017-05-10 11:33:08 -07:00
bluetooth Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup() 2017-04-30 12:22:14 +02:00
bus
cdrom
char Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
clk Sort of on the quieter side this time, which is probably due more 2017-05-10 13:38:18 -07:00
clocksource Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-05-12 10:43:25 -07:00
connector
cpufreq Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-05-12 09:56:30 -07:00
cpuidle Merge branches 'pm-domains', 'pm-cpuidle', 'pm-sleep' and 'powercap' 2017-05-09 23:21:46 +02:00
crypto virtio: fixes, cleanups, performance 2017-05-10 11:33:08 -07:00
dax Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm 2017-05-12 15:43:10 -07:00
dca
devfreq
dio
dma dmaengine updates for 4.12-rc1 2017-05-09 15:40:28 -07:00
dma-buf
edac EDAC, amd64: Fix reporting of Chip Select sizes on Fam17h 2017-05-03 16:27:36 +02:00
eisa
extcon
firewire
firmware - fix bad EFI vars iterator usage 2017-05-16 13:29:07 -07:00
fmc
fpga fpga fr br: update supported version numbers 2017-04-26 11:38:56 +02:00
fsi
gpio Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
gpu drm/i915: Make vblank evade warnings optional 2017-05-12 14:28:02 +10:00
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2017-05-02 19:09:35 -07:00
hsi
hv char/misc patches for 4.12-rc1 2017-05-04 19:15:35 -07:00
hwmon hwmon: (coretemp) Handle frozen hotplug state correctly 2017-05-14 07:49:32 -07:00
hwspinlock
hwtracing drivers/hwtracing/intel_th/msu.c: use set_memory.h header 2017-05-08 17:15:14 -07:00
i2c i2c: xgene: Set ACPI_COMPANION_I2C 2017-05-17 09:21:06 +02:00
ide ide: don't call memcpy with the same source and destination 2017-05-08 17:36:39 -04:00
idle x86/intel_idle: add Gemini Lake support 2017-05-01 23:17:37 +02:00
iio Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
infiniband Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2017-05-12 11:44:13 -07:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2017-05-13 10:25:05 -07:00
iommu IOMMU Updates for Linux v4.12 2017-05-09 15:15:47 -07:00
ipack
irqchip Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-05-12 09:56:30 -07:00
isdn Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
leds scripts/spelling.txt: add "memory" pattern and fix typos 2017-05-08 17:15:13 -07:00
lguest
lightnvm lightnvm: fix bad back free on error path 2017-05-04 07:53:04 -06:00
macintosh DeviceTree for 4.12: 2017-05-05 19:33:07 -07:00
mailbox mailbox: handle empty message in tx_tick 2017-04-27 16:20:04 +05:30
mcb
md Merge tag 'md/4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-05-18 12:04:41 -07:00
media Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
memory MTD updates for 4.12-rc1: 2017-05-11 10:44:22 -07:00
memstick
message
mfd mfd: axp20x: Support AXP803 variant 2017-04-27 11:54:49 +01:00
misc Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
mmc Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
mtd This pull request contains updates for both UBI and UBIFS: 2017-05-13 10:23:12 -07:00
net ibmveth: Support to enable LSO/CSO for Trunk VEA. 2017-05-21 13:29:01 -04:00
nfc
ntb
nubus
nvdimm Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm 2017-05-12 15:43:10 -07:00
nvme Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2017-05-11 11:01:56 -07:00
nvmem ARM: SoC driver updates 2017-05-09 10:01:15 -07:00
of powerpc updates for 4.12 part 2 2017-05-12 10:04:09 -07:00
oprofile
parisc
parport
pci Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
pcmcia Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
perf
phy
pinctrl Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2017-05-02 19:09:35 -07:00
platform char/misc patches for 4.12-rc1 2017-05-04 19:15:35 -07:00
pnp
power power supply and reset changes for the v4.12 series (part 2) 2017-05-12 12:02:21 -07:00
powercap powercap: intel_rapl: Add support for Gemini Lake 2017-04-28 23:56:16 +02:00
pps
ps3
ptp Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-05-01 16:15:18 -07:00
pwm
rapidio char/misc patches for 4.12-rc1 2017-05-04 19:15:35 -07:00
ras
regulator Merge remote-tracking branch 'regulator/topic/vctrl' into regulator-next 2017-04-30 22:17:44 +09:00
remoteproc virtio: fixes, cleanups, performance 2017-05-10 11:33:08 -07:00
reset ARM: SoC driver updates 2017-05-09 10:01:15 -07:00
rpmsg virtio: fixes, cleanups, performance 2017-05-10 11:33:08 -07:00
rtc RTC for 4.12 2017-05-10 19:37:14 -07:00
s390 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2017-05-16 09:24:44 -07:00
sbus
scsi qed: Utilize FW 8.20.0.0 2017-05-18 13:21:40 -04:00
sfi
sh
sn
soc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-05-15 15:50:49 -07:00
spi Merge remote-tracking branches 'spi/topic/ti-qspi' and 'spi/topic/xlp' into spi-next 2017-04-26 15:58:22 +01:00
spmi
ssb
staging Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
target Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2017-05-12 11:44:13 -07:00
tc
tee TEE driver infrastructure and OP-TEE drivers 2017-05-10 11:20:09 -07:00
thermal Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux 2017-05-12 11:58:45 -07:00
thunderbolt
tty Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
uio
usb DeviceTree for 4.12: 2017-05-05 19:33:07 -07:00
uwb
vfio powerpc updates for 4.12 part 1. 2017-05-05 11:36:44 -07:00
vhost vhost_net: try batch dequing from skb array 2017-05-18 10:07:42 -04:00
video fbdev changes for v4.12: 2017-05-11 11:12:26 -07:00
virt drivers/virt/fsl_hypervisor.c: use get_user_pages_unlocked() 2017-05-08 17:15:10 -07:00
virtio virtio: allow extra context per descriptor 2017-05-02 23:41:43 +03:00
vlynq
vme
w1
watchdog Annotation of module parameters that specify device settings 2017-05-10 19:13:03 -07:00
xen Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-05-09 09:12:53 -07:00
zorro
Kconfig
Makefile TEE driver infrastructure and OP-TEE drivers 2017-05-10 11:20:09 -07:00