Currently lguest has three threads: the main Launcher thread, a Waker
thread, and a thread for the block device (because synchronous block
was simply too painful to bear).
The Waker selects() on all the input file descriptors (eg. stdin, net
devices, pipe to the block thread) and when one becomes readable it calls
into the kernel to kick the Launcher thread out into userspace, which
repeats the poll, services the device(s), and then tells the kernel to
release the Waker before re-entering the kernel to run the Guest.
Also, to make a slightly-decent network transmit routine, the Launcher
would suppress further network interrupts while it set a timer: that
signal handler would write to a pipe, which would rouse the Waker
which would prod the Launcher out of the kernel to check the network
device again.
Now we can convert all our virtqueues to separate threads: each one has
a separate eventfd for when the Guest pokes the device, and can trigger
interrupts in the Guest directly.
The linecount shows how much this simplifies, but to really bring it
home, here's an strace analysis of single Guest->Host ping before:
* Guest sends packet, notifies xmit vq, return control to Launcher
* Launcher clears notification flag on xmit ring
* Launcher writes packet to TUN device
writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\366\r\224`\2058\272m\224vf\274\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108
* Launcher sets up interrupt for Guest (xmit ring is empty)
write(10, "\2\0\0\0\3\0\0\0", 8) = 0
* Launcher sets up timer for interrupt mitigation
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 505}}, NULL) = 0
* Launcher re-runs guest
pread64(10, 0xbfa5f4d4, 4, 0) ...
* Waker notices reply packet in tun device (it was in select)
select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [4])
* Waker kicks Launcher out of guest:
pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0
* Launcher returns from running guest:
... = -1 EAGAIN (Resource temporarily unavailable)
* Launcher looks at input fds:
select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [4], left {0, 0})
* Launcher reads pong from tun device:
readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\272m\224vf\274\366\r\224`\2058\10\0E\0\0T\364\26\0\0@"..., 1518}], 2) = 108
* Launcher injects guest notification:
write(10, "\2\0\0\0\2\0\0\0", 8) = 0
* Launcher rechecks fds:
select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout)
* Launcher clears Waker:
pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0
* Launcher reruns Guest:
pread64(10, 0xbfa5f4d4, 4, 0) = ? ERESTARTSYS (To be restarted)
* Signal comes in, uses pipe to wake up Launcher:
--- SIGALRM (Alarm clock) @ 0 (0) ---
write(8, "\0", 1) = 1
sigreturn() = ? (mask now [])
* Waker sees write on pipe:
select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [6])
* Waker kicks Launcher out of Guest:
pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0
* Launcher exits from kernel:
pread64(10, 0xbfa5f4d4, 4, 0) = -1 EAGAIN (Resource temporarily unavailable)
* Launcher looks to see what fd woke it:
select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [6], left {0, 0})
* Launcher reads timeout fd, sets notification flag on xmit ring
read(6, "\0", 32) = 1
* Launcher rechecks fds:
select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout)
* Launcher clears Waker:
pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0
* Launcher resumes Guest:
pread64(10, "\0p\0\4", 4, 0) ....
strace analysis of single Guest->Host ping after:
* Guest sends packet, notifies xmit vq, creates event on eventfd.
* Network xmit thread wakes from read on eventfd:
read(7, "\1\0\0\0\0\0\0\0", 8) = 8
* Network xmit thread writes packet to TUN device
writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"J\217\232FI\37j\27\375\276\0\304\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108
* Network recv thread wakes up from read on tunfd:
readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"j\27\375\276\0\304J\217\232FI\37\10\0E\0\0TiO\0\0@\1\214"..., 1518}], 2) = 108
* Network recv thread sets up interrupt for the Guest
write(6, "\2\0\0\0\2\0\0\0", 8) = 0
* Network recv thread goes back to reading tunfd
13:39:42.460285 readv(4, <unfinished ...>
* Network xmit thread sets up interrupt for Guest (xmit ring is empty)
write(6, "\2\0\0\0\3\0\0\0", 8) = 0
* Network xmit thread goes back to reading from eventfd
read(7, <unfinished ...>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>