linux_dsm_epyc7002/drivers/block
Roger Pau Monne 0a8704a51f xen/blkback: Persistent grant maps for xen blk drivers
This patch implements persistent grants for the xen-blk{front,back}
mechanism. The effect of this change is to reduce the number of unmap
operations performed, since they cause a (costly) TLB shootdown. This
allows the I/O performance to scale better when a large number of VMs
are performing I/O.

Previously, the blkfront driver was supplied a bvec[] from the request
queue. This was granted to dom0; dom0 performed the I/O and wrote
directly into the grant-mapped memory and unmapped it; blkfront then
removed foreign access for that grant. The cost of unmapping scales
badly with the number of CPUs in Dom0. An experiment showed that when
Dom0 has 24 VCPUs, and guests are performing parallel I/O to a
ramdisk, the IPIs from performing unmap's is a bottleneck at 5 guests
(at which point 650,000 IOPS are being performed in total). If more
than 5 guests are used, the performance declines. By 10 guests, only
400,000 IOPS are being performed.

This patch improves performance by only unmapping when the connection
between blkfront and back is broken.

On startup blkfront notifies blkback that it is using persistent
grants, and blkback will do the same. If blkback is not capable of
persistent mapping, blkfront will still use the same grants, since it
is compatible with the previous protocol, and simplifies the code
complexity in blkfront.

To perform a read, in persistent mode, blkfront uses a separate pool
of pages that it maps to dom0. When a request comes in, blkfront
transmutes the request so that blkback will write into one of these
free pages. Blkback keeps note of which grefs it has already
mapped. When a new ring request comes to blkback, it looks to see if
it has already mapped that page. If so, it will not map it again. If
the page hasn't been previously mapped, it is mapped now, and a record
is kept of this mapping. Blkback proceeds as usual. When blkfront is
notified that blkback has completed a request, it memcpy's from the
shared memory, into the bvec supplied. A record that the {gref, page}
tuple is mapped, and not inflight is kept.

Writes are similar, except that the memcpy is peformed from the
supplied bvecs, into the shared pages, before the request is put onto
the ring.

Blkback stores a mapping of grefs=>{page mapped to by gref} in
a red-black tree. As the grefs are not known apriori, and provide no
guarantees on their ordering, we have to perform a search
through this tree to find the page, for every gref we receive. This
operation takes O(log n) time in the worst case. In blkfront grants
are stored using a single linked list.

The maximum number of grants that blkback will persistenly map is
currently set to RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST, to
prevent a malicios guest from attempting a DoS, by supplying fresh
grefs, causing the Dom0 kernel to map excessively. If a guest
is using persistent grants and exceeds the maximum number of grants to
map persistenly the newly passed grefs will be mapped and unmaped.
Using this approach, we can have requests that mix persistent and
non-persistent grants, and we need to handle them correctly.
This allows us to set the maximum number of persistent grants to a
lower value than RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST, although
setting it will lead to unpredictable performance.

In writing this patch, the question arrises as to if the additional
cost of performing memcpys in the guest (to/from the pool of granted
pages) outweigh the gains of not performing TLB shootdowns. The answer
to that question is `no'. There appears to be very little, if any
additional cost to the guest of using persistent grants. There is
perhaps a small saving, from the reduced number of hypercalls
performed in granting, and ending foreign access.

Signed-off-by: Oliver Chick <oliver.chick@citrix.com>
Signed-off-by: Roger Pau Monne <roger.pau@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v1: Fixed up the misuse of bool as int]
2012-10-30 09:50:04 -04:00
..
aoe aoe: update aoe-internal version number to 50 2012-10-06 03:05:30 +09:00
drbd block: Generalized bio pool freeing 2012-09-09 10:35:38 +02:00
mtip32xx mtip32xx: fix user_buffer check in exec_drive_command 2012-09-12 22:21:13 +02:00
paride paride/pcd: fix bool verbose module parameter. 2012-01-13 09:32:26 +10:30
xen-blkback xen/blkback: Persistent grant maps for xen blk drivers 2012-10-30 09:50:04 -04:00
amiflop.c fs: move code out of buffer.c 2012-01-03 22:54:07 -05:00
ataflop.c block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers 2011-04-21 21:33:05 +02:00
brd.c block: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:16 +08:00
cciss_cmd.h cciss: use new doorbell-bit-5 reset method 2011-05-06 08:23:55 -06:00
cciss_scsi.c cciss: fix handling of protocol error 2012-09-18 11:57:08 +02:00
cciss_scsi.h cciss: add cciss_tape_cmds module paramter 2011-05-06 08:23:59 -06:00
cciss.c block: add and use scsi_blk_cmd_ioctl 2012-01-14 15:07:24 -08:00
cciss.h cciss: Adds simple mode functionality 2011-08-08 11:40:15 +02:00
cpqarray.c drivers/block/cpqarray.c: use pci_dev->revision 2011-09-21 10:02:13 +02:00
cpqarray.h
cryptoloop.c drivers: Remove unnecessary inclusions of asm/semaphore.h 2008-04-18 22:16:32 -04:00
DAC960.c dac960: Remove unused variables from DAC960_CreateProcEntries() 2012-05-11 16:42:14 +02:00
DAC960.h
floppy.c workqueue: deprecate __cancel_delayed_work() 2012-08-21 13:18:24 -07:00
hd.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
ida_cmd.h
ida_ioctl.h
Kconfig block: remove the deprecated ub driver 2012-09-05 17:18:53 -07:00
loop.c userns: Convert loop to use kuid_t instead of uid_t 2012-09-21 03:13:20 -07:00
Makefile block: remove the deprecated ub driver 2012-09-05 17:18:53 -07:00
mg_disk.c mg_disk: Use struct dev_pm_ops for power management 2012-07-06 19:07:00 +02:00
nbd.c nbd: handle discard requests 2012-10-06 03:05:24 +09:00
nvme.c PCI changes for the 3.7 merge window: 2012-10-01 12:05:36 -07:00
osdblk.c block: Add bio_clone_bioset(), bio_clone_kmalloc() 2012-09-09 10:35:39 +02:00
pktcdvd.c pktcdvd: Switch to bio_kmalloc() 2012-09-09 10:35:39 +02:00
ps3disk.c block: Fix files that are modules and hence need module.h 2011-10-31 19:31:13 -04:00
ps3vram.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
rbd_types.h rbd: define some new format constants 2012-10-01 14:30:53 -05:00
rbd.c rbd: BUG on invalid layout 2012-10-01 17:20:00 -05:00
smart1,2.h fix typos 'comamnd' -> 'command' in comments 2011-02-02 11:31:21 +01:00
sunvdc.c powerpc+sparc/vio: Modernize driver registration 2012-03-28 11:33:24 +11:00
swim3.c block/swim3: Locking fixes 2011-12-12 12:42:12 +01:00
swim_asm.S m68k: mac - Add SWIM floppy support 2009-03-26 21:15:27 +01:00
swim.c m68k/mac: cleanup forward declarations 2011-12-10 19:52:46 +01:00
sx8.c block, sx8: fix pointer math issue getting fw version 2012-03-03 19:44:39 +01:00
umem.c blk: pass from_schedule to non-request unplug functions. 2012-07-31 09:08:15 +02:00
umem.h
virtio_blk.c virtio-blk: Disable callback in virtblk_done() 2012-09-28 15:05:16 +09:30
xd.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
xd.h [PATCH] switch xd 2008-10-21 07:48:11 -04:00
xen-blkfront.c xen/blkback: Persistent grant maps for xen blk drivers 2012-10-30 09:50:04 -04:00
xsysace.c block: xsysace: Don't use NO_IRQ 2012-01-05 08:34:29 +01:00
z2ram.c drivers/block/z2ram.c: correct printing of sector_t 2010-10-28 06:15:26 -06:00