Try to grab the MSI-X vectors early and fall back to the shared one
before doing lots of allocations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This lets IRQ layer handle dispatching IRQs to separate handlers for the
case where we don't have per-VQ MSI-X vectors, and allows us to greatly
simplify the code based on the assumption that we always have interrupt
vector 0 (legacy INTx or config interrupt for MSI-X) available, and
any other interrupt is request/freed throught the VQ, even if the
actual interrupt line might be shared in some cases.
This allows removing a great deal of variables keeping track of the
interrupt state in struct virtio_pci_device, as we can now simply walk the
list of VQs and deal with per-VQ interrupt handlers there, and only treat
vector 0 special.
Additionally clean up the VQ allocation code to properly unwind on error
instead of having a single global cleanup label, which is error prone,
and in this case also leads to more code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We don't really need struct virtio_pci_vq_info, as most field in there
are redundant:
- the vq backpointer is not strictly neede to start with
- the entry in the vqs list is not needed - the generic virtqueue already
has list, we only need to check if it has a callback to get the same
semantics
- we can use a simple array to look up the MSI-X vec if needed.
- That simple array now also duoble serves to replace the per_vq_vectors
flag
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
With CONFIG_BALLOON_COMPACTION=y the kernel will mount balloon_mnt for
balloon page migration when we probe a virtio_balloon device. However
we do not unmount it when removing the device. Fix this.
Fixes: b1123ea6d3 ("mm: balloon: use general non-lru movable page feature")
Link: http://lkml.kernel.org/r/1486531318-35189-1-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The conflict was an interaction between a bug fix in the
netvsc driver in 'net' and an optimization of the RX path
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
For XDP we will need to reset the queues to allow for buffer headroom
to be configured. In order to do this we need to essentially run the
freeze()/restore() code path. Unfortunately the locking requirements
between the freeze/restore and reset paths are different however so
we can not simply reuse the code.
This patch refactors the code path and adds a reset helper routine.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit c7070619f3.
This has been shown to regress on some ARM systems:
by forcing on DMA API usage for ARM systems, we have inadvertently
kicked open a hornets' nest in terms of cache-coherency. Namely that
unless the virtio device is explicitly described as capable of coherent
DMA by firmware, the DMA APIs on ARM and other DT-based platforms will
assume it is non-coherent. This turns out to cause a big problem for the
likes of QEMU and kvmtool, which generate virtio-mmio devices in their
guest DTs but neglect to add the often-overlooked "dma-coherent"
property; as a result, we end up with the guest making non-cacheable
accesses to the vring, the host doing so cacheably, both talking past
each other and things going horribly wrong.
We are working on a safer work-around.
Fixes: c7070619f3 ("vring: Force use of DMA API for ARM-based systems with legacy devices")
Reported-by: Robin Murphy <robin.murphy@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Booting Linux on an ARM fastmodel containing an SMMU emulation results
in an unexpected I/O page fault from the legacy virtio-blk PCI device:
[ 1.211721] arm-smmu-v3 2b400000.smmu: event 0x10 received:
[ 1.211800] arm-smmu-v3 2b400000.smmu: 0x00000000fffff010
[ 1.211880] arm-smmu-v3 2b400000.smmu: 0x0000020800000000
[ 1.211959] arm-smmu-v3 2b400000.smmu: 0x00000008fa081002
[ 1.212075] arm-smmu-v3 2b400000.smmu: 0x0000000000000000
[ 1.212155] arm-smmu-v3 2b400000.smmu: event 0x10 received:
[ 1.212234] arm-smmu-v3 2b400000.smmu: 0x00000000fffff010
[ 1.212314] arm-smmu-v3 2b400000.smmu: 0x0000020800000000
[ 1.212394] arm-smmu-v3 2b400000.smmu: 0x00000008fa081000
[ 1.212471] arm-smmu-v3 2b400000.smmu: 0x0000000000000000
<system hangs failing to read partition table>
This is because the legacy virtio-blk device is behind an SMMU, so we
have consequently swizzled its DMA ops and configured the SMMU to
translate accesses. This then requires the vring code to use the DMA API
to establish translations, otherwise all transactions will result in
fatal faults and termination.
Given that ARM-based systems only see an SMMU if one is really present
(the topology is all described by firmware tables such as device-tree or
IORT), then we can safely use the DMA API for all legacy virtio devices.
Modern devices can advertise the prescense of an IOMMU using the
VIRTIO_F_IOMMU_PLATFORM feature flag.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: <stable@vger.kernel.org>
Fixes: 876945dbf6 ("arm64: Hook up IOMMU dma_ops")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Once DMA API usage is enabled, it becomes apparent that virtio-mmio is
inadvertently relying on the default 32-bit DMA mask, which leads to
problems like rapidly exhausting SWIOTLB bounce buffers.
Ensure that we set the appropriate 64-bit DMA mask whenever possible,
with the coherent mask suitably limited for the legacy vring as per
a0be1db430 ("virtio_pci: Limit DMA mask to 44 bits for legacy virtio
devices").
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Reported-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Fixes: b42111382f ("virtio_mmio: Use the DMA API if enabled")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fix a warning thrown from virtio_mmio_remove():
Device 'virtio0' does not have a release() function
The fix is according to virtio_pci_probe() of
drivers/virtio/virtio_pci_common.c
Signed-off-by: Yuan Liu <liuyuan@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The device (not the driver) populates the used ring and includes the len
of how much data was written.
Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
There is basically no shared logic between the INTx and MSI-X case in
vp_try_to_find_vqs, so split the function into two and clean them up
a little bit.
Also remove the fairly pointless vp_request_intx wrapper while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vp_request_msix_vectors is only called by vp_try_to_find_vqs, which already
calls vp_free_vectors through vp_del_vqs in the failure case.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This avoids the separate allocation for the msix_entries structures, and
instead allows us to use pci_irq_vector to find a given IRQ vector.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# make C=2 CF="-D__CHECK_ENDIAN__" ./drivers/virtio/
drivers/virtio/virtio_ring.c:423:19: warning: incorrect type in assignment (different base types)
drivers/virtio/virtio_ring.c:423:19: expected unsigned int [unsigned] [assigned] i
drivers/virtio/virtio_ring.c:423:19: got restricted __virtio16 [usertype] next
drivers/virtio/virtio_ring.c:423:19: warning: incorrect type in assignment (different base types)
drivers/virtio/virtio_ring.c:423:19: expected unsigned int [unsigned] [assigned] i
drivers/virtio/virtio_ring.c:423:19: got restricted __virtio16 [usertype] next
drivers/virtio/virtio_ring.c:423:19: warning: incorrect type in assignment (different base types)
drivers/virtio/virtio_ring.c:423:19: expected unsigned int [unsigned] [assigned] i
drivers/virtio/virtio_ring.c:423:19: got restricted __virtio16 [usertype] next
drivers/virtio/virtio_ring.c:604:39: warning: incorrect type in initializer (different base types)
drivers/virtio/virtio_ring.c:604:39: expected unsigned short [unsigned] [usertype] nextflag
drivers/virtio/virtio_ring.c:604:39: got restricted __virtio16
drivers/virtio/virtio_ring.c:612:33: warning: restricted __virtio16 degrades to integer
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJYHmoCAAoJEHm+PkMAQRiG7RMIAI2i7Y5hpL5yCxK5AFaL4u/G
KxXfp1B1UanUTgjOmd7zGqtDYcFX9t7GTTUFixQ7/9Opr4PD9qbnatoDGSc3xjbT
msDgA1B78F1/Q3kHWfeGq32MihQ4mj5NwUCo+igUcUvvWG7mHgzErj/Nh5RoobQX
p/izdpTbrw3GX6xXB8olbG7XWHaVye/+TT3q6+gmgm8I/QEujcLeGoycE0zlhPN8
FG/JX76At/+ZM2Py7Oxo3k+oKL9CHrtOQYDp/wN0uslV5eYvvkZz0/M1HMOGZt+c
gZU5jzM17K7C4Nzo06WAuBU9wUBGc25m+cPicLlOmljnzfU+f50SKaDjZq3p7QI=
=2KUF
-----END PGP SIGNATURE-----
Merge tag 'v4.9-rc4' into sound
Bring in -rc4 patches so I can successfully merge the sound doc changes.
This inline function is unused on configurations
where dma_map/unmap are empty macros.
Make the function inline to avoid gcc errors because
of an unused static function.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The following commit 'fad7b7b27b6a (virtio_balloon: Use a workqueue
instead of "vballoon" kthread)' has added a regression. Original code with
kthread starts the thread inside probe and checks the necessity to update
balloon inside the thread immediately.
Nowadays the code behaves differently. Work is queued only on the first
command from the host after the negotiation. Thus there is a window
especially at the guest startup or the module reloading when the balloon
size is not updated until the notification from the host.
This patch adds balloon size check at the end of the probe to match
original behaviour.
Signed-off-by: Konstantin Neumoin <kneumoin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
According to the spec, if the VIRTIO_RING_F_EVENT_IDX feature bit is
negotiated the driver MUST set flags to 0. Not dirtying the available
ring in virtqueue_disable_cb also has a minor positive performance
impact, improving L1 dcache load missed by ~0.5% in vring_bench.
Writes to the used event field (vring_used_event) are still unconditional.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: <stable@vger.kernel.org> # f277ec4 virtio_ring: shadow available
Cc: <stable@vger.kernel.org>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Legacy virtio defines the virtqueue base using a 32-bit PFN field, with
a read-only register indicating a fixed page size of 4k.
This can cause problems for DMA allocators that allocate top down from
the DMA mask, which is set to 64 bits. In this case, the addresses are
silently truncated to 44-bit, leading to IOMMU faults, failure to read
from the queue or data corruption.
This patch restricts the coherent DMA mask for legacy PCI virtio devices
to 44 bits, which matches the specification.
Cc: stable@vger.kernel.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Benjamin Serebrin <serebrin@google.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The previous patch renamed several files that are cross-referenced
along the Kernel documentation. Adjust the links to point to
the right places.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
We get 1 warning when building kernel with W=1:
drivers/virtio/virtio_ring.c:170:16: warning: no previous prototype for 'vring_dma_dev' [-Wmissing-prototypes]
In fact, this function is only used in the file in which it is
declared and don't need a declaration, but can be made static.
so this patch marks this function with 'static'.
Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
On error, virtqueue_add calls START_USE but not
END_USE. Thankfully that's normally empty anyway,
but might not be when debugging. Fix it up.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When using the indirect buffers feature, 'desc' is allocated in
virtqueue_add() but isn't freed before leaving on a ring full error,
causing a memory leak.
For example, it seems rather clear that this can trigger
with virtio net if mergeable buffers are not used.
Cc: stable@vger.kernel.org
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
- New vsock device support in host and guest
- Platform IOMMU support in host and guest,
including compatibility quirks for legacy systems.
- Misc fixes and cleanups.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXofvbAAoJECgfDbjSjVRpUTIH/iEoK9h636tBayXy0PXkPby0
6fMaRFy6H1HgEttgDhJE8Pqg/ba3qaW9Em0fHyFq7Mp2waFHAZ8hAT8phC6TAK3c
CIBnfzyyuI8u3N9SnNOfelPVcwCBfuALuuTsXB/rwKbYQEVv+U5Rdt3Vyx9+lXkj
P005klz7PfqxFhQrrnj4Eh7VawtHwmMuLH8YoWpCZpM71dHPo6eL+3ftKwhH2boo
qK86uVprwba03Pewpm13vQnotemfVfUUkjXd4EJpG3dx7E0KZosuj0ZG9OV8mPGQ
Cl2gBdUhocdJgeUnAHmf6tumYi9KFlYfy6xLy44YMmN7FL3E9nQjaKZp25UKfiM=
=ztIm
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio/vhost updates from Michael Tsirkin:
- new vsock device support in host and guest
- platform IOMMU support in host and guest, including compatibility
quirks for legacy systems.
- misc fixes and cleanups.
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
VSOCK: Use kvfree()
vhost: split out vringh Kconfig
vhost: detect 32 bit integer wrap around
vhost: new device IOTLB API
vhost: drop vringh dependency
vhost: convert pre sorted vhost memory array to interval tree
vhost: introduce vhost memory accessors
VSOCK: Add Makefile and Kconfig
VSOCK: Introduce vhost_vsock.ko
VSOCK: Introduce virtio_transport.ko
VSOCK: Introduce virtio_vsock_common.ko
VSOCK: defer sock removal to transports
VSOCK: transport-specific vsock_transport functions
vhost: drop vringh dependency
vop: pull in vhost Kconfig
virtio: new feature to detect IOMMU device quirk
balloon: check the number of available pages in leak balloon
vhost: lockless enqueuing
vhost: simplify work flushing
The interaction between virtio and IOMMUs is messy.
On most systems with virtio, physical addresses match bus addresses,
and it doesn't particularly matter which one we use to program
the device.
On some systems, including Xen and any system with a physical device
that speaks virtio behind a physical IOMMU, we must program the IOMMU
for virtio DMA to work at all.
On other systems, including SPARC and PPC64, virtio-pci devices are
enumerated as though they are behind an IOMMU, but the virtio host
ignores the IOMMU, so we must either pretend that the IOMMU isn't
there or somehow map everything as the identity.
Add a feature bit to detect that quirk: VIRTIO_F_IOMMU_PLATFORM.
Any device with this feature bit set to 0 needs a quirk and has to be
passed physical addresses (as opposed to bus addresses) even though
the device is behind an IOMMU.
Note: it has to be a per-device quirk because for example, there could
be a mix of passed-through and virtual virtio devices. As another
example, some devices could be implemented by an out of process
hypervisor backend (in case of qemu vhost, or vhost-user) and so support
for an IOMMU needs to be coded up separately.
It would be cleanest to handle this in IOMMU core code, but that needs
per-device DMA ops. While we are waiting for that to be implemented, use
a work-around in virtio core.
Note: a "noiommu" feature is a quirk - add a wrapper to make
that clear.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The balloon has a special mechanism that is subscribed to the oom
notification which leads to deflation for a fixed number of pages.
The number is always fixed even when the balloon is fully deflated.
But leak_balloon did not expect that the pages to deflate will be more
than taken, and raise a "BUG" in balloon_page_dequeue when page list
will be empty.
So, the simplest solution would be to check that the number of releases
pages is less or equal to the number taken pages.
Cc: stable@vger.kernel.org
Signed-off-by: Konstantin Neumoin <kneumoin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Randy reported below build error.
> In file included from ../include/linux/balloon_compaction.h:48:0,
> from ../mm/balloon_compaction.c:11:
> ../include/linux/compaction.h:237:51: warning: 'struct node' declared inside parameter list [enabled by default]
> static inline int compaction_register_node(struct node *node)
> ../include/linux/compaction.h:237:51: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
> ../include/linux/compaction.h:242:54: warning: 'struct node' declared inside parameter list [enabled by default]
> static inline void compaction_unregister_node(struct node *node)
>
It was caused by non-lru page migration which needs compaction.h but
compaction.h doesn't include any header to be standalone.
I think proper header for non-lru page migration is migrate.h rather
than compaction.h because migrate.h has already headers needed to work
non-lru page migration indirectly like isolate_mode_t, migrate_mode
MIGRATEPAGE_SUCCESS.
[akpm@linux-foundation.org: revert mm-balloon-use-general-non-lru-movable-page-feature-fix.patch temp fix]
Link: http://lkml.kernel.org/r/20160610003304.GE29779@bbox
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now, VM has a feature to migrate non-lru movable pages so balloon
doesn't need custom migration hooks in migrate.c and compaction.c.
Instead, this patch implements the page->mapping->a_ops->
{isolate|migrate|putback} functions.
With that, we could remove hooks for ballooning in general migration
functions and make balloon compaction simple.
[akpm@linux-foundation.org: compaction.h requires that the includer first include node.h]
Link: http://lkml.kernel.org/r/1464736881-24886-4-git-send-email-minchan@kernel.org
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Everything should be LE when using virtio-1, but
the linux balloon driver does not seem to care about that.
Reported-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Smatch complains that we might not initialize "queue". The issue is
callers like setup_vq() from virtio_pci_modern.c where "num" could be
something like 2 and "vring_align" is 64. In that case, vring_size() is
less than PAGE_SIZE. It won't happen in real life, but we're getting
the value of "num" from a register so it's not really possible to tell
what value it holds with static analysis.
Let's just silence the warning.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The spec says: after writing 0 to device_status, the driver MUST wait
for a read of device_status to return 0 before reinitializing the
device.
Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
This adds basic polling support for vhost.
Reworks virtio to optionally use DMA API, fixing it on Xen.
Balloon stats gained a new entry.
Using the new napi_alloc_skb speeds up virtio net.
virtio blk stats can now be read while another VCPU
us busy inflating or deflating the balloon.
Plus misc cleanups in various places.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJW7qRJAAoJECgfDbjSjVRpVNoH/A7z+lZ6nooSJ9fUBtAAlwit
mE1VKi8g0G6naV1NVLFVe7hPAejExGiHfR3ZUrVoenJKj2yeW/DFojFC10YR/KTe
ac7Imuc+owA3UOE/QpeGBs59+EEWKTZUYt6r8HSJVwoodeosw9v2ecP/Iwhbax8H
a4V3HqOADjKnHg73R9o3u+bAgA1GrGYHeK0AfhCBSTNwlPdxkvf0463HgfOpM4nl
/sNoFWO3vOyekk+loIk+jpmWVIoIfG2NFzW4lPwEPkfqUBX7r0ei/NR23hIqHL7r
QZ6vMj1Ew9qctUONbJu4kXjuV2Vk9NhxwbDjoJtm8plKL2hz2prJynUEogkHh2g=
=VMD0
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio/vhost updates from Michael Tsirkin:
"New features, performance improvements, cleanups:
- basic polling support for vhost
- rework virtio to optionally use DMA API, fixing it on Xen
- balloon stats gained a new entry
- using the new napi_alloc_skb speeds up virtio net
- virtio blk stats can now be read while another VCPU is busy
inflating or deflating the balloon
plus misc cleanups in various places"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_net: replace netdev_alloc_skb_ip_align() with napi_alloc_skb()
vhost_net: basic polling support
vhost: introduce vhost_vq_avail_empty()
vhost: introduce vhost_has_work()
virtio_balloon: Allow to resize and update the balloon stats in parallel
virtio_balloon: Use a workqueue instead of "vballoon" kthread
virtio/s390: size of SET_IND payload
virtio/s390: use dev_to_virtio
vhost: rename vhost_init_used()
vhost: rename cross-endian helpers
virtio_blk: VIRTIO_BLK_F_WCE->VIRTIO_BLK_F_FLUSH
vring: Use the DMA API on Xen
virtio_pci: Use the DMA API if enabled
virtio_mmio: Use the DMA API if enabled
virtio: Add improved queue allocation API
virtio_ring: Support DMA APIs
vring: Introduce vring_use_dma_api()
s390/dma: Allow per device dma ops
alpha/dma: use common noop dma ops
dma: Provide simple noop dma ops
Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory
statistics protocol, corresponding to 'Available' in /proc/meminfo.
It indicates to the hypervisor how big the balloon can be inflated
without pushing the guest system to swap.
Signed-off-by: Igor Redko <redkoi@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The virtio balloon statistics are not updated when the balloon
is being resized. But it seems that both tasks could be done
in parallel.
stats_handle_request() updates the statistics in the balloon
structure and then communicates with the host.
update_balloon_stats() calls all_vm_events() that just reads
some per-CPU variables. The values might change during and
after the call but it is expected and happens even without
this patch.
update_balloon_stats() also calls si_meminfo(). It is a bit
more complex function. It too just reads some variables and
looks lock-less safe. In each case, it seems to be called
lock-less on several similar locations, e.g. from post_status()
in dm_thread_func(), or from vmballoon_send_get_target().
The communication with the host is done via a separate virtqueue,
see vb->stats_vq vs. vb->inflate_vq and vb->deflate_vq. Therefore
it could be used in parallel with fill_balloon() and leak_balloon().
This patch splits the existing work into two pieces. One is for
updating the balloon stats. The other is for resizing of the balloon.
It seems that they can be proceed in parallel without any
extra locking.
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch moves the deferred work from the "vballoon" kthread into a
system freezable workqueue.
We do not need to maintain and run a dedicated kthread. Also the event
driven workqueues API makes the logic much easier. Especially, we do
not longer need an own wait queue, wait function, and freeze point.
The conversion is pretty straightforward. One cycle of the main loop
is put into a work. The work is queued instead of waking the kthread.
fill_balloon() and leak_balloon() have a limit for the amount of modified
pages. The work re-queues itself when necessary. For this, we make
fill_balloon() to return the number of really modified pages.
Note that leak_balloon() already did this.
virtballoon_restore() queues the work only when really needed.
The only complication is that we need to prevent queuing the work
when the balloon is being removed. It was easier before because the
kthread simply removed itself from the wait queue. We need an
extra boolean and spin lock now.
My initial idea was to use a dedicated workqueue. Michael S. Tsirkin
suggested using a system one. Tejun Heo confirmed that the system
workqueue has a pretty high concurrency level (256) by default.
Therefore we need not be afraid of too long blocking.
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce PCI_VENDOR/PCI_SUBVENDOR/PCI_SUBDEVICE defines to replace the
constants scattered in the kernel already used to detect QEMU.
They are defined in the QEMU codebase per docs/specs/pci-ids.txt.
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
This switches to vring_create_virtqueue, simplifying the driver and
adding DMA API support.
This fixes virtio-pci on platforms and busses that have IOMMUs. This
will break the experimental QEMU Q35 IOMMU support until QEMU is
fixed. In exchange, it fixes physical virtio hardware as well as
virtio-pci running under Xen.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This switches to vring_create_virtqueue, simplifying the driver and
adding DMA API support.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This leaves vring_new_virtqueue alone for compatbility, but it
adds two new improved APIs:
vring_create_virtqueue: Creates a virtqueue backed by automatically
allocated coherent memory. (Some day it this could be extended to
support non-coherent memory, too, if there ends up being a platform
on which it's worthwhile.)
__vring_new_virtqueue: Creates a virtqueue with a manually-specified
layout. This should allow mic_virtio to work much more cleanly.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio_ring currently sends the device (usually a hypervisor)
physical addresses of its I/O buffers. This is okay when DMA
addresses and physical addresses are the same thing, but this isn't
always the case. For example, this never works on Xen guests, and
it is likely to fail if a physical "virtio" device ever ends up
behind an IOMMU or swiotlb.
The immediate use case for me is to enable virtio on Xen guests.
For that to work, we need vring to support DMA address translation
as well as a corresponding change to virtio_pci or to another
driver.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This is a kludge, but no one has come up with a a better idea yet.
We'll introduce DMA API support guarded by vring_use_dma_api().
Eventually we may be able to return true on more and more systems,
and hopefully we can get rid of vring_use_dma_api() entirely some
day.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Looks like a copy-paste bug. The value is used as an optimization and a
wrong value probably isn't causing any serious damage. Found when
porting this code to Windows.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
KASan detected a use-after-free error in virtio-pci remove code. In
virtio_pci_remove(), vp_dev is still used after being freed in
unregister_virtio_device() (in virtio_pci_release_dev() more
precisely).
To fix, keep a reference until cleanup is done.
Fixes: 63bd62a08c ("virtio_pci: defer kfree until release callback")
Reported-by: Jerome Marchand <jmarchan@redhat.com>
Cc: stable@vger.kernel.org
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Jerome Marchand <jmarchan@redhat.com>
checkpatch.pl wants arrays of strings declared as follows:
static const char * const names[] = { "vq-1", "vq-2", "vq-3" };
Currently the find_vqs() function takes a const char *names[] argument
so passing checkpatch.pl's const char * const names[] results in a
compiler error due to losing the second const.
This patch adjusts the find_vqs() prototype and updates all virtio
transports. This makes it possible for virtio_balloon.c, virtio_input.c,
virtgpu_kms.c, and virtio_rpmsg_bus.c to use the checkpatch.pl-friendly
type.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
During my compaction-related stuff, I encountered a bug
with ballooning.
With repeated inflating and deflating cycle, guest memory(
ie, cat /proc/meminfo | grep MemTotal) is decreased and
couldn't be recovered.
The reason is balloon_lock doesn't cover release_pages_balloon
so struct virtio_balloon fields could be overwritten by race
of fill_balloon(e,g, vb->*pfns could be critical).
This patch fixes it in my test.
Cc: <stable@vger.kernel.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We need a full barrier after writing out event index, using
virt_store_mb there seems better than open-coding. As usual, we need a
wrapper to account for strong barriers.
It's tempting to use this in vhost as well, for that, we'll
need a variant of smp_store_mb that works on __user pointers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Improves cacheline transfer flow of available ring header.
Virtqueues are implemented as a pair of rings, one producer->consumer
avail ring and one consumer->producer used ring; preceding the
avail ring in memory are two contiguous u16 fields -- avail->flags
and avail->idx. A producer posts work by writing to avail->idx and
a consumer reads avail->idx.
The flags and idx fields only need to be written by a producer CPU
and only read by a consumer CPU; when the producer and consumer are
running on different CPUs and the virtio_ring code is structured to
only have source writes/sink reads, we can continuously transfer the
avail header cacheline between 'M' states between cores. This flow
optimizes core -> core bandwidth on certain CPUs.
(see: "Software Optimization Guide for AMD Family 15h Processors",
Section 11.6; similar language appears in the 10h guide and should
apply to CPUs w/ exclusive caches, using LLC as a transfer cache)
Unfortunately the existing virtio_ring code issued reads to the
avail->idx and read-modify-writes to avail->flags on the producer.
This change shadows the flags and index fields in producer memory;
the vring code now reads from the shadows and only ever writes to
avail->flags and avail->idx, allowing the cacheline to transfer
core -> core optimally.
In a concurrent version of vring_bench, the time required for
10,000,000 buffer checkout/returns was reduced by ~2% (average
across many runs) on an AMD Piledriver (15h) CPU:
(w/o shadowing):
Performance counter stats for './vring_bench':
5,451,082,016 L1-dcache-loads
...
2.221477739 seconds time elapsed
(w/ shadowing):
Performance counter stats for './vring_bench':
5,405,701,361 L1-dcache-loads
...
2.168405376 seconds time elapsed
The further away (in a NUMA sense) virtio producers and consumers are
from each other, the more we expect to benefit. Physical implementations
of virtio devices and implementations of virtio where the consumer polls
vring avail indexes (vhost) should also benefit.
Signed-off-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
b92b1b89a3 ("virtio: force vring descriptors to be allocated from
lowmem") tried to exclude highmem pages for descriptors so it cleared
__GFP_HIGHMEM from a given gfp mask. The patch also cleared __GFP_HIGH
which doesn't make much sense for this fix because __GFP_HIGH only
controls access to memory reserves and it doesn't have any influence
on the zone selection. Some of the call paths use GFP_ATOMIC and
dropping __GFP_HIGH will reduce their changes for success because the
lack of access to memory reserves.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Mel Gorman <mgorman@techsingularity.net>
The virtio core uses a static ida named virtio_index_ida for
assigning index numbers to virtio devices during registration.
The ida core may allocate some internal idr cache layers and
an ida bitmap upon any ida allocation, and all these layers are
truely freed only upon the ida destruction. The virtio_index_ida
is not destroyed at present, leading to a memory leak when using
the virtio core as a module and atleast one virtio device is
registered and unregistered.
Fix this by invoking ida_destroy() in the virtio core module
exit.
Cc: stable@vger.kernel.org
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Balloon device is frequently used as a mean of cooperative memory control
in between guest and host to manage memory overcommitment. This is the
typical case for any hosting workload when KVM guest is provided for
end-user.
Though there is a problem in this setup. The end-user and hosting provider
have signed SLA agreement in which some amount of memory is guaranted for
the guest. The good thing is that this memory will be given to the guest
when the guest will really need it (f.e. with OOM in guest and with
VIRTIO_BALLOON_F_DEFLATE_ON_OOM configuration flag set). The bad thing
is that end-user does not know this.
Balloon by default reduce the amount of memory exposed to the end-user
each time when the page is stolen from guest or returned back by using
adjust_managed_page_count and thus /proc/meminfo shows reduced amount
of memory.
Fortunately the solution is simple, we should just avoid to call
adjust_managed_page_count with VIRTIO_BALLOON_F_DEFLATE_ON_OOM set.
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
and rename it to release_pages_balloon. The function originally takes
arrays of pfns and now it takes pointer to struct virtio_ballon.
This change is necessary to conditionally call adjust_managed_page_count
in the next patch.
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Added the match table and pointers for ACPI probing to the driver.
This uses the same identifier for virt devices as being used for qemu
ARM64 ACPI support.
d0bf1955a3
Signed-off-by: Graeme Gregory <graeme.gregory@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Spec requires a device reset during cleanup, so do it and avoid warn
in virtio core. And detach unused buffers to avoid memory leak.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
I have just queued some more bugfix patches today but none fix regressions and
none are related to these ones, so it looks like a good time for a merge for
-rc1.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJVk7JOAAoJECgfDbjSjVRpHgEIAKrgLd7gIQ8lO+LCYqne6WLQ
Ky8rOUnaxX4gD5N0akhfJFr/m/yIyAfk9+ALZZUo3kfuFiEsT2rn32iK/2Gj8pcu
HFoAWhS+7b/ZsfpHRPtv/zVD3q4c3nWsWpfWK09J+4t0UJuC8fmGMoBzkS0kjZtd
dQnHlJi5+1u4ch2x9sYYeVx7GOJ8a1W0q7cWJnWdOffWLEP9/zB8fgRVLFp/7AAd
uBlza93RU81wS7q5tSUph6ESPqt2yu357e//4jnWjVx5EUXDRBL3A/T1JpC1qYSn
WV2Gv14x+LVz2G8WgGmwfMq1H9Dvd/OzNToX5R8SIRx6Rh5L6gxFQjqt4dclGj8=
=nKap
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio/vhost cross endian support from Michael Tsirkin:
"I have just queued some more bugfix patches today but none fix
regressions and none are related to these ones, so it looks like a
good time for a merge for -rc1.
The motivation for this is support for legacy BE guests on the new LE
hosts. There are two redeeming properties that made me merge this:
- It's a trivial amount of code: since we wrap host/guest accesses
anyway, almost all of it is well hidden from drivers.
- Sane platforms would never set flags like VHOST_CROSS_ENDIAN_LEGACY,
and when it's clear, there's zero overhead (as some point it was
tested by compiling with and without the patches, got the same
stripped binary).
Maybe we could create a Kconfig symbol to enforce the second point:
prevent people from enabling it eg on x86. I will look into this"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio-pci: alloc only resources actually used.
macvtap/tun: cross-endian support for little-endian hosts
vhost: cross-endian support for legacy devices
virtio: add explicit big-endian support to memory accessors
vhost: introduce vhost_is_little_endian() helper
vringh: introduce vringh_is_little_endian() helper
macvtap: introduce macvtap_is_little_endian() helper
tun: add tun_is_little_endian() helper
virtio: introduce virtio_is_little_endian() helper
Main excitement here is Peter Zijlstra's lockless rbtree optimization to
speed module address lookup. He found some abusers of the module lock
doing that too.
A little bit of parameter work here too; including Dan Streetman's breaking
up the big param mutex so writing a parameter can load another module (yeah,
really). Unfortunately that broke the usual suspects, !CONFIG_MODULES and
!CONFIG_SYSFS, so those fixes were appended too.
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVkgKHAAoJENkgDmzRrbjxQpwQAJVmBN6jF3SnwbQXv9vRixjH
58V33sb1G1RW+kXxQ3/e8jLX/4VaN479CufruXQp+IJWXsN/CH0lbC3k8m7u50d7
b1Zeqd/Yrh79rkc11b0X1698uGCSMlzz+V54Z0QOTEEX+nSu2ZZvccFS4UaHkn3z
rqDo00lb7rxQz8U25qro2OZrG6D3ub2q20TkWUB8EO4AOHkPn8KWP2r429Axrr0K
wlDWDTTt8/IsvPbuPf3T15RAhq1avkMXWn9nDXDjyWbpLfTn8NFnWmtesgY7Jl4t
GjbXC5WYekX3w2ZDB9KaT/DAMQ1a7RbMXNSz4RX4VbzDl+yYeSLmIh2G9fZb1PbB
PsIxrOgy4BquOWsJPm+zeFPSC3q9Cfu219L4AmxSjiZxC3dlosg5rIB892Mjoyv4
qxmg6oiqtc4Jxv+Gl9lRFVOqyHZrTC5IJ+xgfv1EyP6kKMUKLlDZtxZAuQxpUyxR
HZLq220RYnYSvkWauikq4M8fqFM8bdt6hLJnv7bVqllseROk9stCvjSiE3A9szH5
OgtOfYV5GhOeb8pCZqJKlGDw+RoJ21jtNCgOr6DgkNKV9CX/kL/Puwv8gnA0B0eh
dxCeB7f/gcLl7Cg3Z3gVVcGlgak6JWrLf5ITAJhBZ8Lv+AtL2DKmwEWS/iIMRmek
tLdh/a9GiCitqS0bT7GE
=tWPQ
-----END PGP SIGNATURE-----
Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module updates from Rusty Russell:
"Main excitement here is Peter Zijlstra's lockless rbtree optimization
to speed module address lookup. He found some abusers of the module
lock doing that too.
A little bit of parameter work here too; including Dan Streetman's
breaking up the big param mutex so writing a parameter can load
another module (yeah, really). Unfortunately that broke the usual
suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
appended too"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
modules: only use mod->param_lock if CONFIG_MODULES
param: fix module param locks when !CONFIG_SYSFS.
rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
module: add per-module param_lock
module: make perm const
params: suppress unused variable error, warn once just in case code changes.
modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
kernel/module.c: avoid ifdefs for sig_enforce declaration
kernel/workqueue.c: remove ifdefs over wq_power_efficient
kernel/params.c: export param_ops_bool_enable_only
kernel/params.c: generalize bool_enable_only
kernel/module.c: use generic module param operaters for sig_enforce
kernel/params: constify struct kernel_param_ops uses
sysfs: tightened sysfs permission checks
module: Rework module_addr_{min,max}
module: Use __module_address() for module_address_lookup()
module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
module: Optimize __module_address() using a latched RB-tree
rbtree: Implement generic latch_tree
seqlock: Introduce raw_read_seqcount_latch()
...
Move resource allocation from common code to legacy and modern code.
Only request resources actually used, i.e. bar0 in legacy mode and
the bar(s) specified by capabilities in modern mode.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The cpumask vp_dev->msix_affinity_masks[info->msix_vector] may contain
staled information when vp_set_vq_affinity() gets called, so clear it
before setting the new cpu bit mask.
Cc: stable@vger.kernel.org
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Most code already uses consts for the struct kernel_param_ops,
sweep the kernel for the last offending stragglers. Other than
include/linux/moduleparam.h and kernel/params.c all other changes
were generated with the following Coccinelle SmPL patch. Merge
conflicts between trees can be handled with Coccinelle.
In the future git could get Coccinelle merge support to deal with
patch --> fail --> grammar --> Coccinelle --> new patch conflicts
automatically for us on patches where the grammar is available and
the patch is of high confidence. Consider this a feature request.
Test compiled on x86_64 against:
* allnoconfig
* allmodconfig
* allyesconfig
@ const_found @
identifier ops;
@@
const struct kernel_param_ops ops = {
};
@ const_not_found depends on !const_found @
identifier ops;
@@
-struct kernel_param_ops ops = {
+const struct kernel_param_ops ops = {
};
Generated-by: Coccinelle SmPL
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Junio C Hamano <gitster@pobox.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: cocci@systeme.lip6.fr
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The PCI core now disables MSI and MSI-X for all devices during enumeration
regardless of CONFIG_PCI_MSI. Remove device-specific code to disable
MSI/MSI-X.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
virtio_device_is_legacy_only is now unused, drop
it from core.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio_device_is_legacy_only is always false now,
drop the test from virtio pci.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio_device_is_legacy_only is always false now,
drop the test from virtio mmio.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We added transitional device support to balloon driver,
so we don't need to black-list it in core anymore.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Virtio 1.0 doesn't include a modern balloon device.
But it's not a big change to support a transitional
balloon device: this has the advantage of supporting
existing drivers, transparently.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As Rusty noted, we were accessing queue_enable with an incorrect width.
Switch to type-safe accessors so we don't make this mistake again in the
future.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The spec is very clear on this:
4.1.3.1 Driver Requirements: PCI Device Layout
The driver MUST access each field using the “natural” access method,
i.e. 32-bit accesses for 32-bit fields, 16-bit accesses for 16-bit
fields and 8-bit accesses for 8-bit fields.
Add type-safe wrappers to prevent access with incorrect width.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio-input is basically evdev-events-over-virtio, so this driver isn't
much more than reading configuration from config space and forwarding
incoming events to the linux input layer.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Going over the virtio mmio code, I noticed that it doesn't correctly
access modern device config values using "natural" accessors: it uses
readb to get/set them byte by byte, while the virtio 1.0 spec explicitly states:
4.2.2.2 Driver Requirements: MMIO Device Register Layout
...
The driver MUST only use 32 bit wide and aligned reads and writes to
access the control registers described in table 4.1.
For the device-specific configuration space, the driver MUST use
8 bit wide accesses for 8 bit wide fields, 16 bit wide and aligned
accesses for 16 bit wide fields and 32 bit wide and aligned accesses for
32 and 64 bit wide fields.
Borrow code from virtio_pci_modern to do this correctly.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio_mmio currently lacks generation support which
makes multi-byte field access racy.
Fix by getting the value at offset 0xfc for version 2
devices. Nothing we can do for version 1, so return
generation id 0.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio balloon has this code:
wait_event_interruptible(vb->config_change,
(diff = towards_target(vb)) != 0
|| vb->need_stats_update
|| kthread_should_stop()
|| freezing(current));
Which is a problem because towards_target() call might block after
wait_event_interruptible sets task state to TAST_INTERRUPTIBLE, causing
the task_struct::state collision typical of nesting of sleeping
primitives
See also http://lwn.net/Articles/628628/ or Thomas's
bug report
http://article.gmane.org/gmane.linux.kernel.virtualization/24846
for a fuller explanation.
To fix, rewrite using wait_woken.
Cc: stable@vger.kernel.org
Reported-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio spec requires that all drivers set DRIVER_OK
before using devices. While balloon isn't yet
included in the virtio 1 spec, previous spec versions
also required this.
virtio balloon might violate this rule: probe calls
kthread_run before setting DRIVER_OK, which might run
immediately and cause balloon to inflate/deflate.
To fix, call virtio_device_ready before running the kthread.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
Since PCI is little endian, 8-bit access might work, but the spec section
is very clear on this:
4.1.3.1 Driver Requirements: PCI Device Layout
The driver MUST access each field using the “natural” access method,
i.e. 32-bit accesses for 32-bit fields, 16-bit accesses for 16-bit
fields and 8-bit accesses for 8-bit fields.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
The virtqueue_add() calls START_USE() upon entry. The virtqueue_kick() is
called if vq->num_added == (1 << 16) - 1 before calling END_USE().
The virtqueue_kick_prepare() called via virtqueue_kick() calls START_USE()
upon entry, and will call panic() if DEBUG is enabled.
Move this virtqueue_kick() call to after END_USE() call.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch add a support for second version of the virtio-mmio device,
which follows OASIS "Virtual I/O Device (VIRTIO) Version 1.0"
specification.
Main changes:
1. The control register symbolic names use the new device/driver
nomenclature rather than the old guest/host one.
2. The driver detect the device version (version 1 is the pre-OASIS
spec, version 2 is compatible with fist revision of the OASIS spec)
and drives the device accordingly.
3. New version uses direct addressing (64 bit address split into two
low/high register) instead of the guest page size based one,
and addresses each part of the queue (descriptors, available, used)
separately.
4. The device activity is now explicitly triggered by writing to the
"queue ready" register.
5. Whole 64 bit features are properly handled now (both ways).
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
release function in modern driver is unused:
it's a left-over from when each driver had
to have its own release.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If set, try legacy interface first, modern one if that fails. Useful to
work around device/driver bugs, and for compatibility testing.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Useful for testing device virtio 1 compatibility.
Based on patch by Rusty - couldn't resist putting
that flying car joke in there!
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The ABI *is* stable, and has been for a while now.
Drop Kconfig warning saying that it's not guaranteed
to work.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Most of our code has
struct foo {
}
Fix one instances where ring is inconsistent.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Most of our code has
struct foo {
}
Fix two instances where balloon is inconsistent.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Virtio 1.0 spec lists device config as optional.
Set get/set callbacks to NULL. Drivers can check that
and fail gracefully.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We don't know the # of VQs that drivers are going to use so it's hard to
predict how much memory we'll need to map. However, the relevant
capability does give us an upper limit.
If that's below a page, we can reduce the number of required
mappings by mapping it all once ahead of the time.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Lightly tested against qemu.
One thing *not* implemented here is separate mappings
for descriptor/avail/used rings. That's nice to have,
will be done later after we have core support.
This also exposes the PCI layout to userspace, and
adds macros for PCI layout offsets:
QEMU wants it, so why not? Trust, but verify.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Most of initialization is device-independent.
Let's move it to common.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Device VQs were getting freed twice: once in every device's removal
functions, and then again in virtio_pci_legacy_remove(). The ones in
devices are called first, so drop the useless second call.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Some devices might not implement config space access
(e.g. remoteproc used not to - before 3.9).
virtio/balloon needs config space access so make it
fail gracefully if not there.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The reason we defer kfree until release function is because it's a
general rule for kobjects: kfree of the reference counter itself is only
legal in the release function.
Previous patch didn't make this clear, document this in code.
Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A struct device which has just been unregistered can live on past the
point at which a driver decides to drop it's initial reference to the
kobject gained on allocation.
This implies that when releasing a virtio device, we can't free a struct
virtio_device until the underlying struct device has been released,
which might not happen immediately on device_unregister().
Unfortunately, this is exactly what virtio pci does:
it has an empty release callback, and frees memory immediately
after unregistering the device.
This causes an easy to reproduce crash if CONFIG_DEBUG_KOBJECT_RELEASE
it enabled.
To fix, free the memory only once we know the device is gone in the release
callback.
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It turns out we need to add device-specific code
in release callback. Move it to virtio_pci_legacy.c.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Most importantly, this fixes using virtio_pci as a module.
Further, the big virtio 1.0 conversion missed a couple of places. This fixes
them up.
This isn't 100% sparse-clean yet because on many architectures get_user
triggers sparse warnings when used with __bitwise tag (when same tag is on both
pointer and value read).
I posted a patchset to fix it up by adding __force on all
arches that don't already have it (many do), when that's
merged these warnings will go away.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJUkSztAAoJECgfDbjSjVRp5DkH/ibw+0ZaEFP/SXWnw6WONpaG
pzMsrfMG/vxlOfutSUdDqG+oqqU2fSLvFq5qDK6Xk9/emRSwGduz29ZaxGh8J1MZ
/Ojqtu/HSLl+UASTs+fMw49itghoIjmAPBwwMkQjvanfqLclgdz9UxzoCOc4YkO0
PJQA/Vw6blVSP1m0p97PvzZkAiIetI2ixZn2vPJZc8vSkOHtygM9HdXKTv785HbG
ycRbR9B3OBMvq26FIuWeyuY93FnyX2Qtf2bzwSSRdzo7qlsNhVVG7sKyWOOR+1xG
TLmjhyTF57oA2GgZCVfgnFxsfiuIKMumfG0jbABTXmBGgA/ULef7HcF/lzLgdq8=
=32cU
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael S Tsirkin:
"virtio 1.0 related fixes
Most importantly, this fixes using virtio_pci as a module.
Further, the big virtio 1.0 conversion missed a couple of places.
This fixes them up.
This isn't 100% sparse-clean yet because on many architectures
get_user triggers sparse warnings when used with __bitwise tag (when
same tag is on both pointer and value read).
I posted a patchset to fix it up by adding __force on all arches that
don't already have it (many do), when that's merged these warnings
will go away"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_pci: restore module attributes
mic/host: fix up virtio 1.0 APIs
vringh: update for virtio 1.0 APIs
vringh: 64 bit features
tools/virtio: add virtio 1.0 in vringh_test
tools/virtio: add virtio 1.0 in virtio_test
tools/virtio: enable -Werror
tools/virtio: 64 bit features
tools/virtio: fix vringh test
tools/virtio: more stubs
virtio: core support for config generation
virtio_pci: add VIRTIO_PCI_NO_LEGACY
virtio_pci: move probe to common file
virtio_pci_common.h: drop VIRTIO_PCI_NO_LEGACY
virtio_config: fix virtio_cread_bytes
virtio: set VIRTIO_CONFIG_S_FEATURES_OK on restore
bug which doesn't merit cc: stable.
All the exciting stuff went via MST this cycle.
Thanks,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUkPhtAAoJENkgDmzRrbjxnEoP/iXZ2d0pwbSQSSw12wXec7BO
aT698KzTHg2naRqWEhRbE5ujGW9ZR1hbkH3r70R2BlM5wZTwCAEkJB4SRfIu6u0R
fv9GUgZTdSIlN2+yBW1eYHa2QlB8V/jx3P2Vyw3/ejGQ2vC6B7JbNsdCZJoyuH5A
jZojZUrltfEdPVO4JkZ9sSzo64fzyoHGj8YYilH8ygcyZOshAvKvP/gOsFyRzyaj
OPCU1CImBj5i1w2dFSErhKPJp/WQl7La1dKwTVE4lEGKadKDmqnStV0JKuvcRPv6
XB/9vwdRXtFIxRUKDH8Wj7DrkKRCsBh6DJUl+MC+c2TVLNu6v0D/WGOlmio2rvoZ
6CGwNBQg4Ex/J8CkcYsgISn/jzDS+0oaZK0P+IYNJtv5e7D6jPJcDoZ1yPXh1uPn
9ZYZlKa08o5/991f3HM2kJKWDP/OdzgHQH6w2exY01Zsz/6TATLvU7xoPQuHelWe
mRd1kbW4ONIUAzIGPhd5PvtnmJ2L82+tN3VmYu50XAl2oSF8fzvQ/VHl9hgyu/vI
WREA7/40e2DYkhRbu7focdEAFYo0HnuRq6Kwp5zYjbfgyBsiaY7DVvAUlnhwhMES
kSFKejGUP00gofrXdffTYNu5LdKMN8eaVT7pMyxNXlQHSzXe3Vc6oqnpXzP17ii7
ahqmeDZKNUF3GSOHc4IC
=COY0
-----END PGP SIGNATURE-----
Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio updates from Rusty Russell:
"A balloon enhancement, and a minor race-on-module-unload theoretical
bug which doesn't merit cc: stable.
All the exciting stuff went via MST this cycle"
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
virtio_balloon: free some memory from balloon on OOM
virtio_balloon: return the amount of freed memory from leak_balloon()
virtio_blk: fix race at module removal
virtio: Fix comment typo 'CONFIG_S_FAILED'
When the virtio_pci driver was moved into virtio_pci_legacy.c the module
licence and other attributes went AWOL. This patch restores them.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Here's the set of driver core patches for 3.19-rc1.
They are dominated by the removal of the .owner field in platform
drivers. They touch a lot of files, but they are "simple" changes, just
removing a line in a structure.
Other than that, a few minor driver core and debugfs changes. There are
some ath9k patches coming in through this tree that have been acked by
the wireless maintainers as they relied on the debugfs changes.
Everything has been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlSOD20ACgkQMUfUDdst+ylLPACg2QrW1oHhdTMT9WI8jihlHVRM
53kAoLeteByQ3iVwWurwwseRPiWa8+MI
=OVRS
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core update from Greg KH:
"Here's the set of driver core patches for 3.19-rc1.
They are dominated by the removal of the .owner field in platform
drivers. They touch a lot of files, but they are "simple" changes,
just removing a line in a structure.
Other than that, a few minor driver core and debugfs changes. There
are some ath9k patches coming in through this tree that have been
acked by the wireless maintainers as they relied on the debugfs
changes.
Everything has been in linux-next for a while"
* tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
fs: debugfs: add forward declaration for struct device type
firmware class: Deletion of an unnecessary check before the function call "vunmap"
firmware loader: fix hung task warning dump
devcoredump: provide a one-way disable function
device: Add dev_<level>_once variants
ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
ath: use seq_file api for ath9k debugfs files
debugfs: add helper function to create device related seq_file
drivers/base: cacheinfo: remove noisy error boot message
Revert "core: platform: add warning if driver has no owner"
drivers: base: support cpu cache information interface to userspace via sysfs
drivers: base: add cpu_device_create to support per-cpu devices
topology: replace custom attribute macros with standard DEVICE_ATTR*
cpumask: factor out show_cpumap into separate helper function
driver core: Fix unbalanced device reference in drivers_probe
driver core: fix race with userland in device_add()
sysfs/kernfs: make read requests on pre-alloc files use the buffer.
sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
fs: sysfs: return EGBIG on write if offset is larger than file size
...
virtio 1.0 devices require that drivers set VIRTIO_CONFIG_S_FEATURES_OK
after finalizing features.
virtio core missed doing this on restore, fix it up.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
kbuild does not seem to like it when we name source
files same as the module.
Let's rename virtio_pci -> virtio_pci_common,
and get rid of #include-ing c files.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Move everything dealing with legacy devices out to virtio_pci_legacy.c.
Expose common code APIs in virtio_pci.h
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
VQ setup is mostly version-specific, add another level of indirection to
split the version-independent code out.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
VQ deletion is mostly version-specific, add another level of indirection
to split the version-independent code out.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
slightly reduce the amount of pointer chasing this needs to do.
More importantly, this will easily generalize to virtio 1.0.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We don't need to go from vq to vq info on
data path, so using direct vq->priv pointer for that
seems like a waste.
Let's build an array of vq infos, then we can use vq->index
for that lookup.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
legacy_only flag is now unused, drop it from core.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
we have blacklisted balloon in core, no need
for a driver flag.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
transports need to be able to detect legacy-only
devices (ATM balloon only) to use legacy path
to drive them.
Add a core API to do just that.
The implementation just blacklists balloon:
not too pretty, but let's not over-engineer.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
We have no plans to support virtio 1.0 in balloon driver. Add an
explicit flag to mark it legacy only.
This will be used by follow-up patches.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio-blk has some legacy feature bits that modern drivers
must not negotiate, but are needed for old legacy hosts
(that e.g. don't support virtio-scsi).
Allow a separate legacy feature table for such cases.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Now that we use u64 for bits, we can simply & them together.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
For virtio-1, we can theoretically have a more complex virtqueue
layout with avail and used buffers not on a contiguous memory area
with the descriptor table. For now, it's fine for a transport driver
to stay with the old layout: It needs, however, a way to access
the locations of the avail/used rings so it can register them with
the host.
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Use virtioXX_to_cpu and friends for access to
all multibyte structures in memory.
Note: this is intentionally mechanical.
A follow-up patch will split long lines etc.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
At this point, no transports set any of the high 32 feature bits.
Since transports generally can't (yet) cope with such bits, add BUG_ON
checks to make sure they are not set by mistake.
Based on rproc patch by Rusty.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Change u32 to u64, and use BIT_ULL and 1ULL everywhere.
Note: transports are unchanged, and only set low 32 bit.
This guarantees that no transport sets e.g. VERSION_1
by mistake without proper support.
Based on patch by Rusty.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
It seemed like a good idea to use bitmap for features
in struct virtio_device, but it's actually a pain,
and seems to become even more painful when we get more
than 32 feature bits. Just change it to a u32 for now.
Based on patch by Rusty.
Suggested-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless
it is often the case that these control tools does not have enough time
to react on fast changing memory load. As a result OS runs out of memory
and invokes OOM-killer. The balancing of memory by use of the virtio
balloon should not cause the termination of processes while there are
pages in the balloon. Now there is no way for virtio balloon driver to
free some memory at the last moment before some process will be get
killed by OOM-killer.
This does not provide a security breach as balloon itself is running
inside guest OS and is working in the cooperation with the host. Thus
some improvements from guest side should be considered as normal.
To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make
the system to return and retry the allocation that forced the out of
memory killer to run.
Allocate virtio feature bit for this: it is not set by default,
the the guest will not deflate virtio balloon on OOM without explicit
permission from host.
Signed-off-by: Raushaniya Maksudova <rmaksudova@parallels.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This value would be useful in the next patch to provide the amount of
the freed memory for OOM killer.
Signed-off-by: Raushaniya Maksudova <rmaksudova@parallels.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
been sitting in MST's tree during my vacation. I changed a function name
and made one trivial change, then they spent two days in linux-next.
Thanks,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUQFBQAAoJENkgDmzRrbjxJRIP/1yCQRElQewxURSmJelyqCdU
0mHYB0R9Mf3tfre1xnofqs2lWeSMc/4ptKHsVR6pupoztSwnz7HsLHfEFvFJh4mj
KsaqYElxkNxTcfyHwLjyJS0/J6tG1tYypXGiimTBS0bvFHL3XZdimVgJ6WvX+gO7
YSaDEX8/EqCERafslS5+gKJlz3drDOnCZCe9y4BDSmsvl2k7bkpSxIn8vsR6jIC0
c5JpUy6QVF+3XA/J932M7yRs+xpqxNoUWiyY3ar9o3CtQAaQB0ZAetSxY6hTfvVc
GlNFzCifdsaQwsl2SVsE2h6tWaRhtMtcGWQuhHThIPyIf8XxhYyBRY2FLo70LMz1
eqtwy6F/Bg/nzUsdee4PZBMeoKHlAEL12RpsEKgfUoLzj16Aqa8ll+Agbglbkw8G
f3d2FwzKAlpY5NwHETC1wYy52PJ3efqksRWuhokmYpxNSbHJS/lsiJOE7272/4Qr
MtXuvRmo22tf34XFd5y7zqWjgZ58eeFOqQWi/K+6ZgpqVOvikjrXXKEuiVdjO0ZD
kTVR/sQKiR+79rzENk80XBhWaMveECNXF1TiZ/3MmURkmEOBRQMxRQ20BX3exvna
AJ/WVA5DcfXZc1yyqknE1NLGrvSBMJENH13x2QPwrqNWAryOOKuF1VKKIwWlDw5j
vtx5nXiJa8YYdxI2TJCN
=JK6x
-----END PGP SIGNATURE-----
Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio updates from Rusty Russell:
"One cc: stable commit, the rest are a series of minor cleanups which
have been sitting in MST's tree during my vacation. I changed a
function name and made one trivial change, then they spent two days in
linux-next"
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits)
virtio-rng: refactor probe error handling
virtio_scsi: drop scan callback
virtio_balloon: enable VQs early on restore
virtio_scsi: fix race on device removal
virito_scsi: use freezable WQ for events
virtio_net: enable VQs early on restore
virtio_console: enable VQs early on restore
virtio_scsi: enable VQs early on restore
virtio_blk: enable VQs early on restore
virtio_scsi: move kick event out from virtscsi_init
virtio_net: fix use after free on allocation failure
9p/trans_virtio: enable VQs early
virtio_console: enable VQs early
virtio_blk: enable VQs early
virtio_net: enable VQs early
virtio: add API to enable VQs early
virtio_net: minor cleanup
virtio-net: drop config_mutex
virtio_net: drop config_enable
virtio-blk: drop config_mutex
...
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after resume returns, virtio balloon
violated this rule by adding bufs, which causes the VQ to be used
directly within restore.
To fix, call virtio_device_ready before using VQ.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Defer config changed notifications that arrive during
probe/scan/freeze/restore.
This will allow drivers to set DRIVER_OK earlier, without worrying about
racing with config change interrupts.
This change will also benefit old hypervisors (before 2009)
that send interrupts without checking DRIVER_OK: previously,
the callback could race with driver-specific initialization.
This will also help simplify drivers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cosmetic changes)
This is in preparation to extending config changed event handling
in core.
Wrapping these in an API also seems to make for a cleaner code.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Replace duplicated code in all transports with a single wrapper in
virtio.c.
The only functional change is in virtio_mmio.c: if a buggy device sends
us an interrupt before driver is set, we previously returned IRQ_NONE,
now we return IRQ_HANDLED.
As this must not happen in practice, this does not look like a big deal.
See also commit 3fff0179e3
virtio-pci: do not oops on config change if driver not loaded.
for the original motivation behind the driver check.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
On restore, virtio pci does the following:
+ set features
+ init vqs etc - device can be used at this point!
+ set ACKNOWLEDGE,DRIVER and DRIVER_OK status bits
This is in violation of the virtio spec, which
requires the following order:
- ACKNOWLEDGE
- DRIVER
- init vqs
- DRIVER_OK
This behaviour will break with hypervisors that assume spec compliant
behaviour. It seems like a good idea to have this patch applied to
stable branches to reduce the support butden for the hypervisors.
Cc: stable@vger.kernel.org
Cc: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Always mark pages with PageBalloon even if balloon compaction is disabled
and expose this mark in /proc/kpageflags as KPF_BALLOON.
Also this patch adds three counters into /proc/vmstat: "balloon_inflate",
"balloon_deflate" and "balloon_migrate". They accumulate balloon
activity. Current size of balloon is (balloon_inflate - balloon_deflate)
pages.
All generic balloon code now gathered under option CONFIG_MEMORY_BALLOON.
It should be selected by ballooning driver which wants use this feature.
Currently virtio-balloon is the only user.
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now ballooned pages are detected using PageBalloon(). Fake mapping is no
longer required. This patch links ballooned pages to balloon device using
field page->private instead of page->mapping. Also this patch embeds
balloon_dev_info directly into struct virtio_balloon.
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sasha Levin reported KASAN splash inside isolate_migratepages_range().
Problem is in the function __is_movable_balloon_page() which tests
AS_BALLOON_MAP in page->mapping->flags. This function has no protection
against anonymous pages. As result it tried to check address space flags
inside struct anon_vma.
Further investigation shows more problems in current implementation:
* Special branch in __unmap_and_move() never works:
balloon_page_movable() checks page flags and page_count. In
__unmap_and_move() page is locked, reference counter is elevated, thus
balloon_page_movable() always fails. As a result execution goes to the
normal migration path. virtballoon_migratepage() returns
MIGRATEPAGE_BALLOON_SUCCESS instead of MIGRATEPAGE_SUCCESS,
move_to_new_page() thinks this is an error code and assigns
newpage->mapping to NULL. Newly migrated page lose connectivity with
balloon an all ability for further migration.
* lru_lock erroneously required in isolate_migratepages_range() for
isolation ballooned page. This function releases lru_lock periodically,
this makes migration mostly impossible for some pages.
* balloon_page_dequeue have a tight race with balloon_page_isolate:
balloon_page_isolate could be executed in parallel with dequeue between
picking page from list and locking page_lock. Race is rare because they
use trylock_page() for locking.
This patch fixes all of them.
Instead of fake mapping with special flag this patch uses special state of
page->_mapcount: PAGE_BALLOON_MAPCOUNT_VALUE = -256. Buddy allocator uses
PAGE_BUDDY_MAPCOUNT_VALUE = -128 for similar purpose. Storing mark
directly in struct page makes everything safer and easier.
PagePrivate is used to mark pages present in page list (i.e. not
isolated, like PageLRU for normal pages). It replaces special rules for
reference counter and makes balloon migration similar to migration of
normal pages. This flag is protected by page_lock together with link to
the balloon device.
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/p/53E6CEAA.9020105@oracle.com
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: <stable@vger.kernel.org> [3.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
virtqueue_add() populates the virtqueue descriptor table from the sgs
given. If it uses an indirect descriptor table, then it puts a single
descriptor in the descriptor table pointing to the kmalloc'ed indirect
table where the sg is populated.
Previously vring_add_indirect() did the allocation and the simple
linear layout. We replace that with alloc_indirect() which allocates
the indirect table then chains it like the normal descriptor table so
we can reuse the core logic.
This slows down pktgen by less than 1/2 a percent (which uses direct
descriptors), as well as vring_bench, but it's far neater.
vring_bench before:
1061485790-1104800648(1.08254e+09+/-6.6e+06)ns
vring_bench after:
1125610268-1183528965(1.14172e+09+/-8e+06)ns
pktgen before:
787781-796334(793165+/-2.4e+03)pps 365-369(367.5+/-1.2)Mb/sec (365530384-369498976(3.68028e+08+/-1.1e+06)bps) errors: 0
pktgen after:
779988-790404(786391+/-2.5e+03)pps 361-366(364.35+/-1.3)Mb/sec (361914432-366747456(3.64885e+08+/-1.2e+06)bps) errors: 0
Now, if we make force indirect descriptors by turning off any_header_sg
in virtio_net.c:
pktgen before:
713773-721062(718374+/-2.1e+03)pps 331-334(332.95+/-0.92)Mb/sec (331190672-334572768(3.33325e+08+/-9.6e+05)bps) errors: 0
pktgen after:
710542-719195(714898+/-2.4e+03)pps 329-333(331.15+/-1.1)Mb/sec (329691488-333706480(3.31713e+08+/-1.1e+06)bps) errors: 0
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We used to have several callers which just used arrays. They're
gone, so we can use sg_next() everywhere, simplifying the code.
On my laptop, this slowed down vring_bench by 15%:
vring_bench before:
936153354-967745359(9.44739e+08+/-6.1e+06)ns
vring_bench after:
1061485790-1104800648(1.08254e+09+/-6.6e+06)ns
However, a more realistic test using pktgen on a AMD FX(tm)-8320 saw
a few percent improvement:
pktgen before:
767390-792966(785159+/-6.5e+03)pps 356-367(363.75+/-2.9)Mb/sec (356068960-367936224(3.64314e+08+/-3e+06)bps) errors: 0
pktgen after:
787781-796334(793165+/-2.4e+03)pps 365-369(367.5+/-1.2)Mb/sec (365530384-369498976(3.68028e+08+/-1.1e+06)bps) errors: 0
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should prefer `struct pci_device_id` over `DEFINE_PCI_DEVICE_TABLE` to meet
kernel coding style guidelines. This issue was reported by checkpatch.
Signed-off-by: Benoit Taine <benoit.taine@lip6.fr>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>