Commit Graph

48 Commits

Author SHA1 Message Date
Alex Williamson
93899a679f vfio-pci: Fix remove path locking
Locking both the remove() and release() path results in a deadlock
that should have been obvious.  To fix this we can get and hold the
vfio_device reference as we evaluate whether to do a bus/slot reset.
This will automatically block any remove() calls, allowing us to
remove the explict lock.  Fixes 61d792562b.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org	[3.17]
2014-09-29 17:18:39 -06:00
Gavin Shan
b8f02af096 vfio/pci: Restore MSIx message prior to enabling
The MSIx vector table lives in device memory, which may be cleared as
part of a backdoor device reset. This is the case on the IBM IPR HBA
when the BIST is run on the device. When assigned to a QEMU guest,
the guest driver does a pci_save_state(), issues a BIST, then does a
pci_restore_state(). The BIST clears the MSIx vector table, but due
to the way interrupts are configured the pci_restore_state() does not
restore the vector table as expected. Eventually this results in an
EEH error on Power platforms when the device attempts to signal an
interrupt with the zero'd table entry.

Fix the problem by restoring the host cached MSI message prior to
enabling each vector.

Reported-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-09-29 10:16:24 -06:00
Alexey Kardashevskiy
9b936c960f drivers/vfio: Enable VFIO if EEH is not supported
The existing vfio_pci_open() fails upon error returned from
vfio_spapr_pci_eeh_open(), which breaks POWER7's P5IOC2 PHB
support which this patch brings back.

The patch fixes the issue by dropping the return value of
vfio_spapr_pci_eeh_open().

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-08-08 10:39:16 -06:00
Alex Williamson
bc4fba7712 vfio-pci: Attempt bus/slot reset on release
Each time a device is released, mark whether a local reset was
successful or whether a bus/slot reset is needed.  If a reset is
needed and all of the affected devices are bound to vfio-pci and
unused, allow the reset.  This is most useful when the userspace
driver is killed and releases all the devices in an unclean state,
such as when a QEMU VM quits.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-08-07 11:12:07 -06:00
Alex Williamson
61d792562b vfio-pci: Use mutex around open, release, and remove
Serializing open/release allows us to fix a refcnt error if we fail
to enable the device and lets us prevent devices from being unbound
or opened, giving us an opportunity to do bus resets on release.  No
restriction added to serialize binding devices to vfio-pci while the
mutex is held though.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-08-07 11:12:04 -06:00
Alex Williamson
9c22e660ce vfio-pci: Release devices with BusMaster disabled
Our current open/release path looks like this:

vfio_pci_open
  vfio_pci_enable
    pci_enable_device
    pci_save_state
    pci_store_saved_state

vfio_pci_release
  vfio_pci_disable
    pci_disable_device
    pci_restore_state

pci_enable_device() doesn't modify PCI_COMMAND_MASTER, so if a device
comes to us with it enabled, it persists through the open and gets
stored as part of the device saved state.  We then restore that saved
state when released, which can allow the device to attempt to continue
to do DMA.  When the group is disconnected from the domain, this will
get caught by the IOMMU, but if there are other devices in the group,
the device may continue running and interfere with the user.  Even in
the former case, IOMMUs don't necessarily behave well and a stream of
blocked DMA can result in unpleasant behavior on the host.

Explicitly disable Bus Master as we're enabling the device and
slightly re-work release to make sure that pci_disable_device() is
the last thing that touches the device.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-08-07 11:12:02 -06:00
Gavin Shan
1b69be5e8a drivers/vfio: EEH support for VFIO PCI device
The patch adds new IOCTL commands for sPAPR VFIO container device
to support EEH functionality for PCI devices, which have been passed
through from host to somebody else via VFIO.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:28:48 +10:00
Gavin Shan
fd49c81f08 drivers/vfio/pci: Fix wrong MSI interrupt count
According PCI local bus specification, the register of Message
Control for MSI (offset: 2, length: 2) has bit#0 to enable or
disable MSI logic and it shouldn't be part contributing to the
calculation of MSI interrupt count. The patch fixes the issue.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-05-30 11:35:54 -06:00
Alex Williamson
eb5685f0de vfio/pci: Fix unchecked return value
There's nothing we can do different if pci_load_and_free_saved_state()
fails, other than maybe print some log message, but the actual re-load
of the state is an unnecessary step here since we've only just saved
it.  We can cleanup a coverity warning and eliminate the unnecessary
step by freeing the state ourselves.

Detected by Coverity: CID 753101

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-05-30 11:35:53 -06:00
Alex Williamson
afa63252b2 vfio/pci: Fix sizing of DPA and THP express capabilities
When sizing the TPH capability we store the register containing the
table size into the 'dword' variable, but then use the uninitialized
'byte' variable to analyze the size.  The table size is also actually
reported as an N-1 value, so correct sizing to account for this.

The round_up() for both TPH and DPA is unnecessary, remove it.

Detected by Coverity: CID 714665 & 715156

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-05-30 10:50:31 -06:00
Alexander Gordeev
94cccde648 vfio: Use pci_enable_msi_range() and pci_enable_msix_range()
pci_enable_msix() and pci_enable_msi_block() have been deprecated; use
pci_enable_msix_range() and pci_enable_msi_range() instead.

[bhelgaas: changelog]
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
2014-02-14 14:23:40 -07:00
Linus Torvalds
2d08cd0ef8 - Convert to misc driver to support module auto loading
- Remove unnecessary and dangerous use of device_lock
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJS4T1aAAoJECObm247sIsiLMkQAKn11qSLk880dJO+cfW2VpqL
 vm3WreShOxNsKTEQfcbrnJtS+kYGkiyo8+Vve377QDIg+xeXpNZa0PbAEfh4HyuJ
 NH+vn0FYDf3rzF4Z9snm2HOHWCAsqLCE0guyMwwhUYH+fks4+YTJWCm4YWZOXbIS
 aI++G+5LBxGzMRbotoovqUhQy474fpQN/GFIlIZHP2bGtdeXiRuH0qH3BxVKafOF
 50RtAgJsLm1Nb+S20Umsz+llL3sop1wcPLKOlgCXqd+545CJAoq/qqEPjUtOrr+S
 qXvXhb0XM7zo73H2WzkVmS09HZWX4sMoHBYp8D6ToOIFF01LOVsXBKduPCPpTD7E
 zohhjgag9/RQ99l2B153IIIv0Bsg7b4YXBQ6qkWFoJolPL94LTq1PXk0f9mNhyPo
 mSqatcAeI5qtjqu9ncSd4YYTdBgp7SJcgwji8fI44tvzzpz8iheVUrpwcU1UumAK
 TpXLBoTLXllK1op3u9xgFrF4ISBMl7lZeZp3c1/1YRga7Ch6SdlB0tcLPfuSBRF0
 1Qb5jQrWz4qt/oZSkapcFALXQDgwoK8am9I0a5YXdhy9gSYm+o/TLwG5pTEnF4In
 xxuibmmJAmNcTJ9q6rUKjLLU/TqRKin+vyqW6is41IredgvNX8XDS3YokA5Fgv01
 mu1s7odFi08xjPRuYvmq
 =t9+R
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v3.14-rc1' of git://github.com/awilliam/linux-vfio

Pull vfio update from Alex Williamson:
 - convert to misc driver to support module auto loading
 - remove unnecessary and dangerous use of device_lock

* tag 'vfio-v3.14-rc1' of git://github.com/awilliam/linux-vfio:
  vfio-pci: Don't use device_lock around AER interrupt setup
  vfio: Convert control interface to misc driver
  misc: Reserve minor for VFIO
2014-01-24 17:42:31 -08:00
Alex Williamson
890ed578df vfio-pci: Use pci "try" reset interface
PCI resets will attempt to take the device_lock for any device to be
reset.  This is a problem if that lock is already held, for instance
in the device remove path.  It's not sufficient to simply kill the
user process or skip the reset if called after .remove as a race could
result in the same deadlock.  Instead, we handle all resets as "best
effort" using the PCI "try" reset interfaces.  This prevents the user
from being able to induce a deadlock by triggering a reset.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-01-15 10:43:17 -07:00
Alex Williamson
3be3a074cf vfio-pci: Don't use device_lock around AER interrupt setup
device_lock is much too prone to lockups.  For instance if we have a
pending .remove then device_lock is already held.  If userspace
attempts to modify AER signaling after that point, a deadlock occurs.
eventfd setup/teardown is already protected in vfio with the igate
mutex.  AER is not a high performance interrupt, so we can also use
the same mutex to protect signaling versus setup races.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-01-14 16:12:55 -07:00
Alex Williamson
274127a1fd PCI: Rename PCI_VC_PORT_REG1/2 to PCI_VC_PORT_CAP1/2
These are set of two capability registers, it's pretty much given that
they're registers, so reflect their purpose in the name.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-12-17 17:49:39 -07:00
Alex Williamson
8b27ee60bf vfio-pci: PCI hot reset interface
The current VFIO_DEVICE_RESET interface only maps to PCI use cases
where we can isolate the reset to the individual PCI function.  This
means the device must support FLR (PCIe or AF), PM reset on D3hot->D0
transition, device specific reset, or be a singleton device on a bus
for a secondary bus reset.  FLR does not have widespread support,
PM reset is not very reliable, and bus topology is dictated by the
system and device design.  We need to provide a means for a user to
induce a bus reset in cases where the existing mechanisms are not
available or not reliable.

This device specific extension to VFIO provides the user with this
ability.  Two new ioctls are introduced:
 - VFIO_DEVICE_PCI_GET_HOT_RESET_INFO
 - VFIO_DEVICE_PCI_HOT_RESET

The first provides the user with information about the extent of
devices affected by a hot reset.  This is essentially a list of
devices and the IOMMU groups they belong to.  The user may then
initiate a hot reset by calling the second ioctl.  We must be
careful that the user has ownership of all the affected devices
found via the first ioctl, so the second ioctl takes a list of file
descriptors for the VFIO groups affected by the reset.  Each group
must have IOMMU protection established for the ioctl to succeed.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-09-04 11:28:04 -06:00
Alex Williamson
17638db1b8 vfio-pci: Test for extended config space
Having PCIe/PCI-X capability isn't enough to assume that there are
extended capabilities.  Both specs define that the first capability
header is all zero if there are no extended capabilities.  Testing
for this avoids an erroneous message about hiding capability 0x0 at
offset 0x100.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-09-04 10:58:52 -06:00
Alex Williamson
20e7745784 vfio-pci: Use fdget() rather than eventfd_fget()
eventfd_fget() tests to see whether the file is an eventfd file, which
we then immediately pass to eventfd_ctx_fileget(), which again tests
whether the file is an eventfd file.  Simplify slightly by using
fdget() so that we only test that we're looking at an eventfd once.
fget() could also be used, but fdget() makes use of fget_light() for
another slight optimization.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-08-28 09:49:55 -06:00
Alex Williamson
d24cdbfd28 vfio-pci: Avoid deadlock on remove
If an attempt is made to unbind a device from vfio-pci while that
device is in use, the request is blocked until the device becomes
unused.  Unfortunately, that unbind path still grabs the device_lock,
which certain things like __pci_reset_function() also want to take.
This means we need to try to acquire the locks ourselves and use the
pre-locked version, __pci_reset_function_locked().

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-07-24 16:36:41 -06:00
Al Viro
a47df1518e vfio: remap_pfn_range() sets all those flags...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:41 +04:00
Linus Torvalds
0b2e3b6bb4 vfio updates for v3.10
Changes include extension to support PCI AER notification to userspace, byte granularity of PCI config space and access to unarchitected PCI config space, better protection around IOMMU driver accesses, default file mode fix, and a few misc cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRgox9AAoJECObm247sIsizJwP/23KprR6BuRt168Obo+xC/lp
 kj1E4Hd4XHx6T/5XRexa4GqhmyLqgoqbS19uj/K6ebY9rouVKo0V6OYNFDQiFR/F
 yEGjYIBKV9/eBZMQI4RbZpZKGDXTeFrXyjsac1m51JrR3st8zsvZlNPE/TjzhSfz
 jMnsCC99ZwIFjmptw/yWwgInswy1n9H9iICS14Xn1x7v71iyOE32reTG6M9HsPHe
 Xm5F0K5iLQX8qoISHAomrTnFmCLxp5Y2qhh7nZWmC2gCbPBEnGdqx4prwzJayAaC
 DAN8osaYldoKQuwI4wFYMICHxxCEFk8xU58FeKCmeQJZgomefVAoRKf86gAEsSLR
 XHptBQOktWVKNFhzZWyOuAX3iua/hmWcOa05I6inycV1x2ZAGlNhvJnj1iKIphLk
 +neBFAD9wDcAJZdXkfudq0jJ9jJTSAWBv8d+hozNfkjTVhF+wRnhZ7V6dZfb9+W6
 kPGbgwqKApLoOabQbxbZ5Ftr6S7344prB/HAMywGa+xZfoxoYPxBHCi0rSTvUTWy
 oWLtzrKRbjBDc0qF4eEK9RZlmVt2ZmCUnUB3kbWED4mrJAHu4LlFghnAloePrIZ3
 kXlMARWbyw/E+1WIMO4Ito5+1s3zVsJJgzVsAQDJkTQw2aYDhns73/Y25Dj7x7QS
 BKebrXnh2xVCN6OIu+eX
 =F0Cc
 -----END PGP SIGNATURE-----

Merge tag 'vfio-for-v3.10' of git://github.com/awilliam/linux-vfio

Pull vfio updates from Alex Williamson:
 "Changes include extension to support PCI AER notification to
  userspace, byte granularity of PCI config space and access to
  unarchitected PCI config space, better protection around IOMMU driver
  accesses, default file mode fix, and a few misc cleanups."

* tag 'vfio-for-v3.10' of git://github.com/awilliam/linux-vfio:
  vfio: Set container device mode
  vfio: Use down_reads to protect iommu disconnects
  vfio: Convert container->group_lock to rwsem
  PCI/VFIO: use pcie_flags_reg instead of access PCI-E Capabilities Register
  vfio-pci: Enable raw access to unassigned config space
  vfio-pci: Use byte granularity in config map
  vfio: make local function vfio_pci_intx_unmask_handler() static
  VFIO-AER: Vfio-pci driver changes for supporting AER
  VFIO: Wrapper for getting reference to vfio_device
2013-05-02 14:02:32 -07:00
Linus Torvalds
96a3e8af5a PCI changes for the v3.10 merge window:
PCI device hotplug
     - Remove ACPI PCI subdrivers (Jiang Liu, Myron Stowe)
     - Make acpiphp builtin only, not modular (Jiang Liu)
     - Add acpiphp mutual exclusion (Jiang Liu)
 
   Power management
     - Skip "PME enabled/disabled" messages when not supported (Rafael Wysocki)
     - Fix fallback to PCI_D0 (Rafael Wysocki)
 
   Miscellaneous
     - Factor quirk_io_region (Yinghai Lu)
     - Cache MSI capability offsets & cleanup (Gavin Shan, Bjorn Helgaas)
     - Clean up EISA resource initialization and logging (Bjorn Helgaas)
     - Fix prototype warnings (Andy Shevchenko, Bjorn Helgaas)
     - MIPS: Initialize of_node before scanning bus (Gabor Juhos)
     - Fix pcibios_get_phb_of_node() declaration "weak" annotation (Gabor Juhos)
     - Add MSI INTX_DISABLE quirks for AR8161/AR8162/etc (Xiong Huang)
     - Fix aer_inject return values (Prarit Bhargava)
     - Remove PME/ACPI dependency (Andrew Murray)
     - Use shared PCI_BUS_NUM() and PCI_DEVID() (Shuah Khan)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRfhSWAAoJEFmIoMA60/r8GrYQAIHDsyZIuJSf6g+8Td1h+PIC
 YD3wQhbyrDqQDuKU4+9cz+JsbHmnozUGA4UmlwmOGBxEa/Uauspb6yX1P1+x9Ok1
 WD7Ar3BlA5OuYI/1L1mgCiA428MTujwoR4fPnC0+KFy8xk1tBpmhzzeOFohbKyFF
 hMBO/Xt9tCzPATJ1LhjIH4xAykfDkbnPNHNcUKRoAkRo0CO0lS8gcTk0shXXSNng
 p9kQ6c4cYZvlRIJTwlawWV09nr7mDsBYa3JClqXYZufUWfEwvIuhisJxCJ57sWi9
 t+Ev8dm7VM6Cr5dV+ORArlboBFrq4f/W5U9j9GPFrRplwf+WbNT6tNGSpSDq8XhU
 Q7JjNgPWVdWXe1vIsMwaO49zi45/bNehuCSFLZiyPZwedMk764tys+iYw+tMRtv1
 tBR7lwESSXfagmvWyQAuQOTy6Rj26BPd2T8e2lMsvsuQO9mCyTK6Ey3YyKuqKQK/
 l5Gns4vv4eaCjGXqqDGiydUjSes+r/v1bu43XiRnwPQJUKb5kr5SjN5/zSMBuUgm
 TLT/bnv8qvdFxCpVQJFv4k/uzULARMdbvLtTy8osB14vNHX9jPn+xORjLaZNiO6O
 7fFispMU8Om56hNkD6C451r3icRjjGlD7OA8KOlbZ8f876sLzGV9i6P9gwCoRdEB
 wclDPsN7kAzw/V2sEE60
 =bj8i
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "PCI changes for the v3.10 merge window:

  PCI device hotplug
   - Remove ACPI PCI subdrivers (Jiang Liu, Myron Stowe)
   - Make acpiphp builtin only, not modular (Jiang Liu)
   - Add acpiphp mutual exclusion (Jiang Liu)

  Power management
   - Skip "PME enabled/disabled" messages when not supported (Rafael
     Wysocki)
   - Fix fallback to PCI_D0 (Rafael Wysocki)

  Miscellaneous
   - Factor quirk_io_region (Yinghai Lu)
   - Cache MSI capability offsets & cleanup (Gavin Shan, Bjorn Helgaas)
   - Clean up EISA resource initialization and logging (Bjorn Helgaas)
   - Fix prototype warnings (Andy Shevchenko, Bjorn Helgaas)
   - MIPS: Initialize of_node before scanning bus (Gabor Juhos)
   - Fix pcibios_get_phb_of_node() declaration "weak" annotation (Gabor
     Juhos)
   - Add MSI INTX_DISABLE quirks for AR8161/AR8162/etc (Xiong Huang)
   - Fix aer_inject return values (Prarit Bhargava)
   - Remove PME/ACPI dependency (Andrew Murray)
   - Use shared PCI_BUS_NUM() and PCI_DEVID() (Shuah Khan)"

* tag 'pci-v3.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (63 commits)
  vfio-pci: Use cached MSI/MSI-X capabilities
  vfio-pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
  PCI: Remove "extern" from function declarations
  PCI: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
  PCI: Drop msi_mask_reg() and remove drivers/pci/msi.h
  PCI: Use msix_table_size() directly, drop multi_msix_capable()
  PCI: Drop msix_table_offset_reg() and msix_pba_offset_reg() macros
  PCI: Drop is_64bit_address() and is_mask_bit_support() macros
  PCI: Drop msi_data_reg() macro
  PCI: Drop msi_lower_address_reg() and msi_upper_address_reg() macros
  PCI: Drop msi_control_reg() macro and use PCI_MSI_FLAGS directly
  PCI: Use cached MSI/MSI-X offsets from dev, not from msi_desc
  PCI: Clean up MSI/MSI-X capability #defines
  PCI: Use cached MSI-X cap while enabling MSI-X
  PCI: Use cached MSI cap while enabling MSI interrupts
  PCI: Remove MSI/MSI-X cap check in pci_msi_check_device()
  PCI: Cache MSI/MSI-X capability offsets in struct pci_dev
  PCI: Use u8, not int, for PM capability offset
  [SCSI] megaraid_sas: Use correct #define for MSI-X capability
  PCI: Remove "extern" from function declarations
  ...
2013-04-29 09:30:25 -07:00
Bjorn Helgaas
a9047f24df vfio-pci: Use cached MSI/MSI-X capabilities
We now cache the MSI/MSI-X capability offsets in the struct pci_dev,
so no need to find the capabilities again.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
2013-04-24 11:35:56 -06:00
Bjorn Helgaas
508d1aa602 vfio-pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
PCI_MSIX_FLAGS_BIRMASK is mis-named because the BIR mask is in the
Table Offset register, not the flags ("Message Control" per spec)
register.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
2013-04-24 11:35:56 -06:00
Yijing Wang
aa2cba51a0 PCI/VFIO: use pcie_flags_reg instead of access PCI-E Capabilities Register
Currently, we use pcie_flags_reg to cache PCI-E Capabilities Register,
because PCI-E Capabilities Register bits are almost read-only. This patch
use pcie_caps_reg() instead of another access PCI-E Capabilities Register.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-04-15 08:45:10 -06:00
Alex Williamson
a7d1ea1c11 vfio-pci: Enable raw access to unassigned config space
Devices like be2net hide registers between the gaps in capabilities
and architected regions of PCI config space.  Our choices to support
such devices is to either build an ever growing and unmanageable white
list or rely on hardware isolation to protect us.  These registers are
really no different than MMIO or I/O port space registers, which we
don't attempt to regulate, so treat PCI config space in the same way.

Reported-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Gavin Shan <shangw@linux.vnet.ibm.com>
2013-04-01 09:04:12 -06:00
Alex Williamson
180b138107 vfio-pci: Use byte granularity in config map
The config map previously used a byte per dword to map regions of
config space to capabilities.  Modulo a bug where we round the length
of capabilities down instead of up, this theoretically works well and
saves space so long as devices don't try to hide registers in the gaps
between capabilities.  Unfortunately they do exactly that so we need
byte granularity on our config space map.  Increase the allocation of
the config map and split accesses at capability region boundaries.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Gavin Shan <shangw@linux.vnet.ibm.com>
2013-04-01 09:03:44 -06:00
Alex Williamson
904c680c7b vfio-pci: Fix possible integer overflow
The VFIO_DEVICE_SET_IRQS ioctl takes a start and count parameter, both
of which are unsigned.  We attempt to bounds check these, but fail to
account for the case where start is a very large number, allowing
start + count to wrap back into the valid range.  Bounds check both
start and start + count.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-03-26 11:33:16 -06:00
Wei Yongjun
0bced2f728 vfio: make local function vfio_pci_intx_unmask_handler() static
vfio_pci_intx_unmask_handler() was not declared. It should be static.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-03-25 11:15:44 -06:00
Arnd Bergmann
25e9789ddd vfio: include <linux/slab.h> for kmalloc
The vfio drivers call kmalloc or kzalloc, but do not
include <linux/slab.h>, which causes build errors on
ARM.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: kvm@vger.kernel.org
2013-03-15 12:58:20 -06:00
Vijay Mohan Pandarathil
dad9f8972e VFIO-AER: Vfio-pci driver changes for supporting AER
- New VFIO_SET_IRQ ioctl option to pass the eventfd that is signaled when
  an error occurs in the vfio_pci_device

- Register pci_error_handler for the vfio_pci driver

- When the device encounters an error, the error handler registered by
  the vfio_pci driver gets invoked by the AER infrastructure

- In the error handler, signal the eventfd registered for the device.

- This results in the qemu eventfd handler getting invoked and
  appropriate action taken for the guest.

Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-03-11 09:31:22 -06:00
Kees Cook
d65530fbc7 drivers/vfio: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-02-24 09:59:44 -07:00
Alex Williamson
84237a826b vfio-pci: Add support for VGA region access
PCI defines display class VGA regions at I/O port address 0x3b0, 0x3c0
and MMIO address 0xa0000.  As these are non-overlapping, we can ignore
the I/O port vs MMIO difference and expose them both in a single
region.  We make use of the VGA arbiter around each access to
configure chipset access as necessary.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-02-18 10:11:13 -07:00
Alex Williamson
2dd1194833 vfio-pci: Manage user power state transitions
We give the user access to change the power state of the device but
certain transitions result in an uninitialized state which the user
cannot resolve.  To fix this we need to mark the PowerState field of
the PMCSR register read-only and effect the requested change on behalf
of the user.  This has the added benefit that pdev->current_state
remains accurate while controlled by the user.

The primary example of this bug is a QEMU guest doing a reboot where
the device it put into D3 on shutdown and becomes unusable on the next
boot because the device did a soft reset on D3->D0 (NoSoftRst-).

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-02-18 10:10:33 -07:00
Alex Williamson
906ee99dd2 vfio-pci: Cleanup BAR access
We can actually handle MMIO and I/O port from the same access function
since PCI already does abstraction of this.  The ROM BAR only requires
a minor difference, so it gets included too.  vfio_pci_config_readwrite
gets renamed for consistency.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-02-14 14:02:12 -07:00
Alex Williamson
5b279a11d3 vfio-pci: Cleanup read/write functions
The read and write functions are nearly identical, combine them
and convert to a switch statement.  This also makes it easy to
narrow the scope of when we use the io/mem accessors in case new
regions are added.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-02-14 14:02:12 -07:00
Alex Williamson
5641ade41f vfio-pci: Enable PCIe extended capabilities on v1
Even PCIe 1.x had extended config space.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-02-14 10:45:31 -07:00
Alex Williamson
ec1287e511 vfio-pci: Fix buffer overfill
A read from a range hidden from the user (ex. MSI-X vector table)
attempts to fill the user buffer up to the end of the excluded range
instead of up to the requested count.  Fix it.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org
2013-01-15 10:45:26 -07:00
Alex Williamson
9a92c5091a vfio-pci: Enable device before attempting reset
Devices making use of PM reset are getting incorrectly identified as
not supporting reset because pci_pm_reset() fails unless the device is
in D0 power state.  When first attached to vfio_pci devices are
typically in an unknown power state.  We can fix this by explicitly
setting the power state or simply calling pci_enable_device() before
attempting a pci_reset_function().  We need to enable the device
anyway, so move this up in our vfio_pci_enable() function, which also
simplifies the error path a bit.

Note that pci_disable_device() does not explicitly set the power
state, so there's no need to re-order vfio_pci_disable().

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-12-07 13:43:51 -07:00
Jiang Liu
05bf3aac93 VFIO: fix out of order labels for error recovery in vfio_pci_init()
The two labels for error recovery in function vfio_pci_init() is out of
order, so fix it.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-12-07 13:43:51 -07:00
Alex Williamson
2007722a60 vfio-pci: Re-order device reset
Move the device reset to the end of our disable path, the device
should already be stopped from pci_disable_device().  This also allows
us to manipulate the save/restore to avoid the save/reset/restore +
save/restore that we had before.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-12-07 13:43:50 -07:00
Fengguang Wu
3a1f7041dd vfio: simplify kmalloc+copy_from_user to memdup_user
Generated by: coccinelle/api/memdup_user.cocci

Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-12-07 13:43:49 -07:00
Alex Williamson
899649b7d4 vfio: Fix PCI INTx disable consistency
The virq_disabled flag tracks the userspace view of INTx masking
across interrupt mode changes, but we're not consistently applying
this to the interrupt and masking handler notion of the device.
Currently if the user sets DisINTx while in MSI or MSIX mode, then
returns to INTx mode (ex. rebooting a qemu guest), the hardware has
DisINTx+, but the management of INTx thinks it's enabled, making it
impossible to actually clear DisINTx.  Fix this by updating the
handler state when INTx is re-enabled.

Cc: stable@vger.kernel.org
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-10 09:10:32 -06:00
Alex Williamson
9dbdfd23b7 vfio: Move PCI INTx eventfd setting earlier
We need to be ready to recieve an interrupt as soon as we call
request_irq, so our eventfd context setting needs to be moved
earlier.  Without this, an interrupt from our device or one
sharing the interrupt line can pass a NULL into eventfd_signal
and oops.

Cc: stable@vger.kernel.org
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-10 09:10:32 -06:00
Alex Williamson
34002f54d2 vfio: Fix PCI mmap after b3b9c293
Our mmap path mistakely relied on vma->vm_pgoff to get set in
remap_pfn_range.  After b3b9c293, that path only applies to
copy-on-write mappings.  Set it in our own code.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-10-10 09:10:31 -06:00
Linus Torvalds
547b1e81af Fix staging driver use of VM_RESERVED
The VM_RESERVED flag was killed off in commit 314e51b985 ("mm: kill
vma flag VM_RESERVED and mm->reserved_vm counter"), and replaced by the
proper semantic flags (eg "don't core-dump" etc).  But there was a new
use of VM_RESERVED that got missed by the merge.

Fix the remaining use of VM_RESERVED in the vfio_pci driver, replacing
the VM_RESERVED flag with VM_DONTEXPAND | VM_DONTDUMP.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation,org>
2012-10-09 21:06:41 +09:00
Alex Williamson
b68e7fa879 vfio: Fix virqfd release race
vfoi-pci supports a mechanism like KVM's irqfd for unmasking an
interrupt through an eventfd.  There are two ways to shutdown this
interface: 1) close the eventfd, 2) ioctl (such as disabling the
interrupt).  Both of these do the release through a workqueue,
which can result in a segfault if two jobs get queued for the same
virqfd.

Fix this by protecting the pointer to these virqfds by a spinlock.
The vfio pci device will therefore no longer have a reference to it
once the release job is queued under lock.  On the ioctl side, we
still flush the workqueue to ensure that any outstanding releases
are completed.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-09-21 10:48:28 -06:00
Alex Williamson
89e1f7d4c6 vfio: Add PCI device driver
Add PCI device support for VFIO.  PCI devices expose regions
for accessing config space, I/O port space, and MMIO areas
of the device.  PCI config access is virtualized in the kernel,
allowing us to ensure the integrity of the system, by preventing
various accesses while reducing duplicate support across various
userspace drivers.  I/O port supports read/write access while
MMIO also supports mmap of sufficiently sized regions.  Support
for INTx, MSI, and MSI-X interrupts are provided using eventfds to
userspace.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2012-07-31 08:16:24 -06:00