Commit Graph

24 Commits

Author SHA1 Message Date
Shaohua Li
bfd20f1cc8 x86, iommu/vt-d: Add an option to disable Intel IOMMU force on
IOMMU harms performance signficantly when we run very fast networking
workloads. It's 40GB networking doing XDP test. Software overhead is
almost unaware, but it's the IOTLB miss (based on our analysis) which
kills the performance. We observed the same performance issue even with
software passthrough (identity mapping), only the hardware passthrough
survives. The pps with iommu (with software passthrough) is only about
~30% of that without it. This is a limitation in hardware based on our
observation, so we'd like to disable the IOMMU force on, but we do want
to use TBOOT and we can sacrifice the DMA security bought by IOMMU. I
must admit I know nothing about TBOOT, but TBOOT guys (cc-ed) think not
eabling IOMMU is totally ok.

So introduce a new boot option to disable the force on. It's kind of
silly we need to run into intel_iommu_init even without force on, but we
need to disable TBOOT PMR registers. For system without the boot option,
nothing is changed.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-26 23:57:53 +02:00
David Woodhouse
907fea3491 iommu/vt-d: Implement deferred invalidate for SVM
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2015-10-15 13:22:35 +01:00
David Woodhouse
2f26e0a9c9 iommu/vt-d: Add basic SVM PASID support
This provides basic PASID support for endpoint devices, tested with a
version of the i915 driver.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2015-10-15 12:55:45 +01:00
Jiang Liu
a868e6b7b6 iommu/vt-d: keep shared resources when failed to initialize iommu devices
Data structure drhd->iommu is shared between DMA remapping driver and
interrupt remapping driver, so DMA remapping driver shouldn't release
drhd->iommu when it failed to initialize IOMMU devices. Otherwise it
may cause invalid memory access to the interrupt remapping driver.

Sample stack dump:
[   13.315090] BUG: unable to handle kernel paging request at ffffc9000605a088
[   13.323221] IP: [<ffffffff81461bac>] qi_submit_sync+0x15c/0x400
[   13.330107] PGD 82f81e067 PUD c2f81e067 PMD 82e846067 PTE 0
[   13.336818] Oops: 0002 [#1] SMP
[   13.340757] Modules linked in:
[   13.344422] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.13.0-rc1-gerry+ #7
[   13.352474] Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T,                                               BIOS SE5C600.86B.99.99.x059.091020121352 09/10/2012
[   13.365659] Workqueue: events work_for_cpu_fn
[   13.370774] task: ffff88042ddf00d0 ti: ffff88042ddee000 task.ti: ffff88042dde                                              e000
[   13.379389] RIP: 0010:[<ffffffff81461bac>]  [<ffffffff81461bac>] qi_submit_sy                                              nc+0x15c/0x400
[   13.389055] RSP: 0000:ffff88042ddef940  EFLAGS: 00010002
[   13.395151] RAX: 00000000000005e0 RBX: 0000000000000082 RCX: 0000000200000025
[   13.403308] RDX: ffffc9000605a000 RSI: 0000000000000010 RDI: ffff88042ddb8610
[   13.411446] RBP: ffff88042ddef9a0 R08: 00000000000005d0 R09: 0000000000000001
[   13.419599] R10: 0000000000000000 R11: 000000000000005d R12: 000000000000005c
[   13.427742] R13: ffff88102d84d300 R14: 0000000000000174 R15: ffff88042ddb4800
[   13.435877] FS:  0000000000000000(0000) GS:ffff88043de00000(0000) knlGS:00000                                              00000000000
[   13.445168] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   13.451749] CR2: ffffc9000605a088 CR3: 0000000001a0b000 CR4: 00000000000407f0
[   13.459895] Stack:
[   13.462297]  ffff88042ddb85d0 000000000000005d ffff88042ddef9b0 0000000000000                                              5d0
[   13.471147]  00000000000005c0 ffff88042ddb8000 000000000000005c 0000000000000                                              015
[   13.480001]  ffff88042ddb4800 0000000000000282 ffff88042ddefa40 ffff88042ddef                                              ac0
[   13.488855] Call Trace:
[   13.491771]  [<ffffffff8146848d>] modify_irte+0x9d/0xd0
[   13.497778]  [<ffffffff8146886d>] intel_setup_ioapic_entry+0x10d/0x290
[   13.505250]  [<ffffffff810a92a6>] ? trace_hardirqs_on_caller+0x16/0x1e0
[   13.512824]  [<ffffffff810346b0>] ? default_init_apic_ldr+0x60/0x60
[   13.519998]  [<ffffffff81468be0>] setup_ioapic_remapped_entry+0x20/0x30
[   13.527566]  [<ffffffff8103683a>] io_apic_setup_irq_pin+0x12a/0x2c0
[   13.534742]  [<ffffffff8136673b>] ? acpi_pci_irq_find_prt_entry+0x2b9/0x2d8
[   13.544102]  [<ffffffff81037fd5>] io_apic_setup_irq_pin_once+0x85/0xa0
[   13.551568]  [<ffffffff8103816f>] ? mp_find_ioapic_pin+0x8f/0xf0
[   13.558434]  [<ffffffff81038044>] io_apic_set_pci_routing+0x34/0x70
[   13.565621]  [<ffffffff8102f4cf>] mp_register_gsi+0xaf/0x1c0
[   13.572111]  [<ffffffff8102f5ee>] acpi_register_gsi_ioapic+0xe/0x10
[   13.579286]  [<ffffffff8102f33f>] acpi_register_gsi+0xf/0x20
[   13.585779]  [<ffffffff81366b86>] acpi_pci_irq_enable+0x171/0x1e3
[   13.592764]  [<ffffffff8146d771>] pcibios_enable_device+0x31/0x40
[   13.599744]  [<ffffffff81320e9b>] do_pci_enable_device+0x3b/0x60
[   13.606633]  [<ffffffff81322248>] pci_enable_device_flags+0xc8/0x120
[   13.613887]  [<ffffffff813222f3>] pci_enable_device+0x13/0x20
[   13.620484]  [<ffffffff8132fa7e>] pcie_port_device_register+0x1e/0x510
[   13.627947]  [<ffffffff810a92a6>] ? trace_hardirqs_on_caller+0x16/0x1e0
[   13.635510]  [<ffffffff810a947d>] ? trace_hardirqs_on+0xd/0x10
[   13.642189]  [<ffffffff813302b8>] pcie_portdrv_probe+0x58/0xc0
[   13.648877]  [<ffffffff81323ba5>] local_pci_probe+0x45/0xa0
[   13.655266]  [<ffffffff8106bc44>] work_for_cpu_fn+0x14/0x20
[   13.661656]  [<ffffffff8106fa79>] process_one_work+0x369/0x710
[   13.668334]  [<ffffffff8106fa02>] ? process_one_work+0x2f2/0x710
[   13.675215]  [<ffffffff81071d56>] ? worker_thread+0x46/0x690
[   13.681714]  [<ffffffff81072194>] worker_thread+0x484/0x690
[   13.688109]  [<ffffffff81071d10>] ? cancel_delayed_work_sync+0x20/0x20
[   13.695576]  [<ffffffff81079c60>] kthread+0xf0/0x110
[   13.701300]  [<ffffffff8108e7bf>] ? local_clock+0x3f/0x50
[   13.707492]  [<ffffffff81079b70>] ? kthread_create_on_node+0x250/0x250
[   13.714959]  [<ffffffff81574d2c>] ret_from_fork+0x7c/0xb0
[   13.721152]  [<ffffffff81079b70>] ? kthread_create_on_node+0x250/0x250

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
2014-01-09 12:43:40 +01:00
Eugeni Dodonov
8bc1f85c02 iommu: Export intel_iommu_enabled to signal when iommu is in use
In i915 driver, we do not enable either rc6 or semaphores on SNB when dmar
is enabled. The new 'intel_iommu_enabled' variable signals when the
iommu code is in operation.

Cc: Ted Phelps <phelps@gnusto.com>
Cc: Peter <pab1612@gmail.com>
Cc: Lukas Hejtmanek <xhejtman@fi.muni.cz>
Cc: Andrew Lutomirski <luto@mit.edu>
CC: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
2011-12-16 08:49:57 -08:00
Suresh Siddha
d3f138106b iommu: Rename the DMAR and INTR_REMAP config options
Change the CONFIG_DMAR to CONFIG_INTEL_IOMMU to be consistent
with the other IOMMU options.

Rename the CONFIG_INTR_REMAP to CONFIG_IRQ_REMAP to match the
irq subsystem name.

And define the CONFIG_DMAR_TABLE for the common ACPI DMAR
routines shared by both CONFIG_INTEL_IOMMU and CONFIG_IRQ_REMAP.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yinghai@kernel.org
Cc: youquan.song@intel.com
Cc: joerg.roedel@amd.com
Cc: tony.luck@intel.com
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/20110824001456.558630224@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-21 10:22:03 +02:00
Suresh Siddha
f5d1b97bcd iommu: Cleanup ifdefs in detect_intel_iommu()
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yinghai@kernel.org
Cc: youquan.song@intel.com
Cc: joerg.roedel@amd.com
Cc: tony.luck@intel.com
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/20110824001456.386003047@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-21 10:21:57 +02:00
Suresh Siddha
318fe7df9d iommu: Move IOMMU specific code to intel-iommu.c
Move the IOMMU specific routines to intel-iommu.c leaving the
dmar.c to the common ACPI dmar code shared between DMA-remapping
and Interrupt-remapping.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yinghai@kernel.org
Cc: youquan.song@intel.com
Cc: joerg.roedel@amd.com
Cc: tony.luck@intel.com
Cc: dwmw2@infradead.org
Link: http://lkml.kernel.org/r/20110824001456.282401285@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-21 10:21:54 +02:00
Youquan Song
6dd9a7c737 intel-iommu: Enable super page (2MiB, 1GiB, etc.) support
There are no externally-visible changes with this. In the loop in the
internal __domain_mapping() function, we simply detect if we are mapping:
  - size >= 2MiB, and
  - virtual address aligned to 2MiB, and
  - physical address aligned to 2MiB, and
  - on hardware that supports superpages.

(and likewise for larger superpages).

We automatically use a superpage for such mappings. We never have to
worry about *breaking* superpages, since we trust that we will always
*unmap* the same range that was mapped. So all we need to do is ensure
that dma_pte_clear_range() will also cope with superpages.

Adjust pfn_to_dma_pte() to take a superpage 'level' as an argument, so
it can return a PTE at the appropriate level rather than always
extending the page tables all the way down to level 1. Again, this is
simplified by the fact that we should never encounter existing small
pages when we're creating a mapping; any old mapping that used the same
virtual range will have been entirely removed and its obsolete page
tables freed.

Provide an 'intel_iommu=sp_off' argument on the command line as a
chicken bit. Not that it should ever be required.

==

The original commit seen in the iommu-2.6.git was Youquan's
implementation (and completion) of my own half-baked code which I'd
typed into an email. Followed by half a dozen subsequent 'fixes'.

I've taken the unusual step of rewriting history and collapsing the
original commits in order to keep the main history simpler, and make
life easier for the people who are going to have to backport this to
older kernels. And also so I can give it a more coherent commit comment
which (hopefully) gives a better explanation of what's going on.

The original sequence of commits leading to identical code was:

Youquan Song (3):
      intel-iommu: super page support
      intel-iommu: Fix superpage alignment calculation error
      intel-iommu: Fix superpage level calculation error in dma_pfn_level_pte()

David Woodhouse (4):
      intel-iommu: Precalculate superpage support for dmar_domain
      intel-iommu: Fix hardware_largepage_caps()
      intel-iommu: Fix inappropriate use of superpages in __domain_mapping()
      intel-iommu: Fix phys_pfn in __domain_mapping for sglist pages

Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01 12:26:35 +01:00
Yu Zhao
93a23a7271 VT-d: support the device IOTLB
Enable the device IOTLB (i.e. ATS) for both the bare metal and KVM
environments.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-05-18 14:46:26 +01:00
Fenghua Yu
4ed0d3e6c6 Intel IOMMU Pass Through Support
The patch adds kernel parameter intel_iommu=pt to set up pass through
mode in context mapping entry. This disables DMAR in linux kernel; but
KVM still runs on VT-d and interrupt remapping still works.

In this mode, kernel uses swiotlb for DMA API functions but other VT-d
functionalities are enabled for KVM. KVM always uses multi level
translation page table in VT-d. By default, pass though mode is disabled
in kernel.

This is useful when people don't want to enable VT-d DMAR in kernel but
still want to use KVM and interrupt remapping for reasons like DMAR
performance concern or debug purpose.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Weidong Han <weidong@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-04-29 06:54:34 +01:00
Sheng Yang
9cf0669746 intel-iommu: VT-d page table to support snooping control bit
The user can request to enable snooping control through VT-d page table.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-03-24 09:42:54 +00:00
Ingo Molnar
c66b9906f8 intel-iommu: fix build error with INTR_REMAP=y and DMAR=n
dmar.o can be built in the CONFIG_INTR_REMAP=y case but
iommu_calculate_agaw() is only available if VT-d is built as well.

So create an inline version of iommu_calculate_agaw() for the
!CONFIG_DMAR case. The iommu->agaw value wont be used in this
case, but the code is cleaner (has less #ifdefs) if we have it around
unconditionally.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-04 11:00:05 +01:00
Weidong Han
1b5736839a calculate agaw for each iommu
"SAGAW" capability may be different across iommus. Use a default agaw, but if default agaw is not supported in some iommus, choose a less supported agaw.

Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-01-03 14:02:18 +01:00
Mark McLoughlin
2abd7e167c intel-iommu: move iommu_prepare_gfx_mapping() out of dma_remapping.h
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-03 11:57:35 +01:00
Mark McLoughlin
a647dacbb1 intel-iommu: move struct device_domain_info out of dma_remapping.h
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-03 11:57:35 +01:00
Mark McLoughlin
99126f7ce1 intel-iommu: move struct dmar_domain def out dma_remapping.h
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-03 11:57:35 +01:00
Mark McLoughlin
622ba12a4c intel-iommu: move DMA PTE defs out of dma_remapping.h
DMA_PTE_READ/WRITE are needed by kvm.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-03 11:57:35 +01:00
Mark McLoughlin
7a8fc25e0c intel-iommu: move context entry defs out from dma_remapping.h
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-03 11:57:35 +01:00
Mark McLoughlin
46b08e1a76 intel-iommu: move root entry defs from dma_remapping.h
We keep the struct root_entry forward declaration for the
pointer in struct intel_iommu.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-03 11:57:35 +01:00
Mark McLoughlin
f27be03b27 intel-iommu: move DMA_32/64BIT_PFN into intel-iommu.c
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-03 11:57:34 +01:00
Mark McLoughlin
519a054915 intel-iommu: make init_dmars() static
init_dmars() is not used outside of drivers/pci/intel-iommu.c

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-03 11:57:34 +01:00
Fenghua Yu
5b6985ce8e intel-iommu: IA64 support
The current Intel IOMMU code assumes that both host page size and Intel
IOMMU page size are 4KiB. The first patch supports variable page size.
This provides support for IA64 which has multiple page sizes.

This patch also adds some other code hooks for IA64 platform including
DMAR_OPERATION_TIMEOUT definition.

[dwmw2: some cleanup]
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2008-10-18 14:29:15 +01:00
Kay, Allen M
3871794642 VT-d: Changes to support KVM
This patch extends the VT-d driver to support KVM

[Ben: fixed memory pinning]
[avi: move dma_remapping.h as well]

Signed-off-by: Kay, Allen M <allen.m.kay@intel.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Acked-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 14:24:08 +02:00