kvm_assigned_dev_ack_irq is vulnerable to a race condition with the
interrupt handler function. It does:
if (dev->host_irq_disabled) {
enable_irq(dev->host_irq);
dev->host_irq_disabled = false;
}
If an interrupt triggers before the host->dev_irq_disabled assignment,
it will disable the interrupt and set dev->host_irq_disabled to true.
On return to kvm_assigned_dev_ack_irq, dev->host_irq_disabled is set to
false, and the next kvm_assigned_dev_ack_irq call will fail to reenable
it.
Other than that, having the interrupt handler and work handlers run in
parallel sounds like asking for trouble (could not spot any obvious
problem, but better not have to, its fragile).
CC: sheng.yang@intel.com
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
KVM uses a function call IPI to cause the exit of a guest running on a
physical cpu. For virtual interrupt notification there is no need to
wait on IPI receival, or to execute any function.
This is exactly what the reschedule IPI does, without the overhead
of function IPI. So use it instead of smp_call_function_single in
kvm_vcpu_kick.
Also change the "guest_mode" variable to a bit in vcpu->requests, and
use that to collapse multiple IPI's that would be issued between the
first one and zeroing of guest mode.
This allows kvm_vcpu_kick to called with interrupts disabled.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Memory aliases with different memory type is a problem for guest. For the guest
without assigned device, the memory type of guest memory would always been the
same as host(WB); but for the assigned device, some part of memory may be used
as DMA and then set to uncacheable memory type(UC/WC), which would be a conflict of
host memory type then be a potential issue.
Snooping control can guarantee the cache correctness of memory go through the
DMA engine of VT-d.
[avi: fix build on ia64]
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Two things needed fixing: 1) g++ does not allow a named structure type
within an anonymous union and 2) Avoid name clash between two padding
fields within the same struct by giving them different names as is
done elsewhere in the header.
Signed-off-by: Nathan Binkert <nate@binkert.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking
if interrupt window is actually opened.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
After discussion with Marcelo, we decided to rework device assignment framework
together. The old problems are kernel logic is unnecessary complex. So Marcelo
suggest to split it into a more elegant way:
1. Split host IRQ assign and guest IRQ assign. And userspace determine the
combination. Also discard msi2intx parameter, userspace can specific
KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_INTX in assigned_irq->flags to
enable MSI to INTx convertion.
2. Split assign IRQ and deassign IRQ. Import two new ioctls:
KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ.
This patch also fixed the reversed _IOR vs _IOW in definition(by deprecated the
old interface).
[avi: replace homemade bitcount() by hweight_long()]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Use kvm_apic_match_dest() in kvm_get_intr_delivery_bitmask() instead
of duplicating the same code. Use kvm_get_intr_delivery_bitmask() in
apic_send_ipi() to figure out ipi destination instead of reimplementing
the logic.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
ioapic_deliver() and kvm_set_msi() have code duplication. Move
the code into ioapic_deliver_entry() function and call it from
both places.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Since "KVM: Unify the delivery of IOAPIC and MSI interrupts"
I get the following warnings:
CC [M] arch/s390/kvm/kvm-s390.o
In file included from arch/s390/kvm/kvm-s390.c:22:
include/linux/kvm_host.h:357: warning: 'struct kvm_ioapic' declared inside parameter list
include/linux/kvm_host.h:357: warning: its scope is only this definition or declaration, which is probably not what you want
This patch limits IOAPIC functions for architectures that have one.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch finally enable MSI-X.
What we need for MSI-X:
1. Intercept one page in MMIO region of device. So that we can get guest desired
MSI-X table and set up the real one. Now this have been done by guest, and
transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY.
2. Information for incoming interrupt. Now one device can have more than one
interrupt, and they are all handled by one workqueue structure. So we need to
identify them. The previous patch enable gsi_msg_pending_bitmap get this done.
3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X
message address/data. We used same entry number for the host and guest here, so
that it's easy to find the correlated guest gsi.
What we lack for now:
1. The PCI spec said nothing can existed with MSI-X table in the same page of
MMIO region, except pending bits. The patch ignore pending bits as the first
step (so they are always 0 - no pending).
2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS
can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch
didn't support this, and Linux also don't work in this way.
3. The patch didn't implement MSI-X mask all and mask single entry. I would
implement the former in driver/pci/msi.c later. And for single entry, userspace
should have reposibility to handle it.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
We have to handle more than one interrupt with one handler for MSI-X. Avi
suggested to use a flag to indicate the pending. So here is it.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Introduce KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY two ioctls.
This two ioctls are used by userspace to specific guest device MSI-X entry
number and correlate MSI-X entry with GSI during the initialization stage.
MSI-X should be well initialzed before enabling.
Don't support change MSI-X entry number for now.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] 5543/1: arm: serial amba: add missing declaration in serial.h
[ARM] pxa: fix pxa27x_udc default pullup GPIO
[ARM] pxa/imote2: fix UCAM sensor board ADC model number
mx[23]: don't put clock lookups in __initdata
fix oops when using console=ttymxcN with N > 0
[ARM] ARMv7 errata: only apply fixes when running on applicable CPU
[ARM] 5534/1: kmalloc must return a cache line aligned buffer
This header is sometimes included in the uncompress stage to get
register values, but no <linux/amba/bus.h> can be included there.
So declare "struct amba_device" here before using it in a prototype.
Signed-off-by: Alessandro Rubini <rubini@unipv.it>
Acked-by: Andrea Gallo <andrea.gallo@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The "trace || CLONE_PTRACE" check in tracehook_report_clone() is not right,
- If the untraced task does clone(CLONE_PTRACE) the new child is not traced,
we must not queue SIGSTOP.
- If we forked the traced task, but the tracer exits and untraces both the
forking task and the new child (after copy_process() drops tasklist_lock),
we should not queue SIGSTOP too.
Change the code to check task_ptrace() != 0 instead. This is still racy, but
the race is harmless.
We can race with another tracer attaching to this child, or the tracer can
exit and detach in parallel. But giwen that we didn't do wake_up_new_task()
yet, the child must have the pending SIGSTOP anyway.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ideally we should have a directory of drivers and a link to the 'active'
driver. For now just show the first device which is effectively the existing
semantics without a warning.
This is an update on the original buggy patch that I then forgot to
resubmit. Confusingly it was proposed by Red Hat, written by Etched Pixels
fixed and submitted by Intel ...
Resolves-Bug: http://bugzilla.kernel.org/show_bug.cgi?id=9749
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fix the following 'make headers_check' warnings:
usr/include/linux/net_dropmon.h:7: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
fix the following 'make headers_check' warnings:
usr/include/linux/auto_fs.h:17: include of <linux/types.h> is preferred over <asm/types.h>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: libps2 - better handle bad scheduler decisions
Input: usb1400_ts - fix access to "device data" in resume function
Input: multitouch - augment event semantics documentation
Input: multitouch - add tracking ID to the protocol
mapping->tree_lock can be acquired from interrupt context. Then,
following dead lock can occur.
Assume "A" as a page.
CPU0:
lock_page_cgroup(A)
interrupted
-> take mapping->tree_lock.
CPU1:
take mapping->tree_lock
-> lock_page_cgroup(A)
This patch tries to fix above deadlock by moving memcg's hook to out of
mapping->tree_lock. charge/uncharge of pagecache/swapcache is protected
by page lock, not tree_lock.
After this patch, lock_page_cgroup() is not called under mapping->tree_lock.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
linux/cred.h can't be included as first header (alphabetical order)
because it uses __init which is enough to break compilation on some archs.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Testing the i7300_idle driver on i5000-series hardware required
an edit to i7300_idle.h to "#define SUPPORT_I5000 1" and a re-build
of both i7300_idle and ioat_dma.
Replace that build-time scheme with a load-time module parameter:
"7300_idle.forceload=1" to make it easier to test the driver
on hardware that while not officially validated, works fine
and is much more commonly available.
By default (no modparam) the driver will continue to load
only on the i7300.
Note that ioat_dma runs a copy of i7300_idle's probe routine
to know to reserve an IOAT channel for i7300_idle.
This change makes ioat_dma do that always on the i5000,
just like it does on the i7300.
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Andrew Henroid <andrew.d.henroid@intel.com>
Robert L Mathews discovered that some clients send evil TCP RST segments,
which are accepted by netfilter conntrack but discarded by the
destination. Thus the conntrack entry is destroyed but the destination
retransmits data until timeout.
The same technique, i.e. sending properly crafted RST segments, can easily
be used to bypass connlimit/connbytes based restrictions (the sample
script written by Robert can be found in the netfilter mailing list
archives).
The patch below adds a new flag and new field to struct ip_ct_tcp_state so
that checking RST segments can be made more strict and thus TCP conntrack
can catch the invalid ones: the RST segment is accepted only if its
sequence number higher than or equal to the highest ack we seen from the
other direction. (The last_ack field cannot be reused because it is used
to catch resent packets.)
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
There are a few multi-touch devices that support finger tracking
well in hardware, Stantum being the prime example. By exposing the
tracking ID in the MT protocol, evdev bandwidth and cpu usage in
user space can be reduced.
This patch adds the ABS_MT_TRACKING_ID to the MT protocol.
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Tested-by: Stéphane Chatty <chatty@enac.fr>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
This patch adds the PCI Device ID 0xc409 to the PCI ID table of via82cxxx.c,
as well as the 0x8409 south bridge ID.
This is required to make the IDE driver work on the VX855/VX875 integrated
chipset.
Signed-off-by: Harald Welte <HaraldWelte@viatech.com>
Cc: Joseph Chan <JosephChan@via.com.tw>
Cc: Bruce Chang <BruceChang@via.com.tw>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
The second argument of the probe method points to the amba_id
structure, so it's better passed with the correct type. None of the
current in-tree drivers uses the pointer, so they have only been
checked for a clean compile.
Change suggested by Russell King.
Signed-off-by: Alessandro Rubini <rubini@unipv.it>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
pfn_valid() is meant to be able to tell if a given PFN has valid memmap
associated with it or not. In FLATMEM, it is expected that holes always
have valid memmap as long as there is valid PFNs either side of the hole.
In SPARSEMEM, it is assumed that a valid section has a memmap for the
entire section.
However, ARM and maybe other embedded architectures in the future free
memmap backing holes to save memory on the assumption the memmap is never
used. The page_zone linkages are then broken even though pfn_valid()
returns true. A walker of the full memmap must then do this additional
check to ensure the memmap they are looking at is sane by making sure the
zone and PFN linkages are still valid. This is expensive, but walkers of
the full memmap are extremely rare.
This was caught before for FLATMEM and hacked around but it hits again for
SPARSEMEM because the page_zone linkages can look ok where the PFN linkages
are totally screwed. This looks like a hatchet job but the reality is that
any clean solution would end up consumning all the memory saved by punching
these unexpected holes in the memmap. For example, we tried marking the
memmap within the section invalid but the section size exceeds the size of
the hole in most cases so pfn_valid() starts returning false where valid
memmap exists. Shrinking the size of the section would increase memory
consumption offsetting the gains.
This patch identifies when an architecture is punching unexpected holes
in the memmap that the memory model cannot automatically detect and sets
ARCH_HAS_HOLES_MEMORYMODEL. At the moment, this is restricted to EP93xx
which is the model sub-architecture this has been reported on but may expand
later. When set, walkers of the full memmap must call memmap_valid_within()
for each PFN and passing in what it expects the page and zone to be for
that PFN. If it finds the linkages to be broken, it assumes the memmap is
invalid for that PFN.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This adds CONFIG_REISERFS_FS_XATTR protection from reiserfs_permission.
This is needed to avoid warnings during file deletions and chowns with
xattrs disabled.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch provides new heuristics for parsing both the form factor and
media rotation rate ATA IDENFITY words.
The reported ATA version must be 7 or greater and the device must return
values defined as valid in the standard. Only then are the
characteristics reported to SCSI via the VPD B1 page.
This seems like a reasonable compromise to me considering that we have
been shipping several kernel releases that key off the rotation rate bit
without any version checking whatsoever. With no complaints so far.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
Revert "mm: add /proc controls for pdflush threads"
viocd: needs to depend on BLOCK
block: fix the bio_vec array index out-of-bounds test
This reverts commit fafd688e4c.
Work is progressing to switch away from pdflush as the process backing
for flushing out dirty data. So it seems pointless to add more knobs
to control pdflush threads. The original author of the patch did not
have any specific use cases for adding the knobs, so we can easily
revert this before 2.6.30 to avoid having to maintain this API
forever.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
dma: fix ipu_idmac.c to not discard the last queued buffer
ioatdma: fix "ioatdma frees DMA memory with wrong function"
ipu_idmac: Use disable_irq_nosync() from within irq handlers.
dmatest: fix max channels handling
as reported by Alexander Beregalov <a.beregalov@gmail.com>
ioatdma 0000:00:08.0: DMA-API: device driver frees DMA memory with
wrong function [device address=0x000000007f76f800] [size=2000 bytes]
[map
ped as single] [unmapped as page]
The ioatdma driver was unmapping all regions
(either allocated as page or single) using unmap_page.
This patch lets dma driver recognize if unmap_single or unmap_page should be used.
It introduces two new dma control flags:
DMA_COMPL_SRC_UNMAP_SINGLE and DMA_COMPL_DEST_UNMAP_SINGLE.
They should be set to indicate dma driver to do dma-unmapping as single
(first one for the source, tha latter for the destination).
If respective flag is not set, the driver assumes dma-unmapping as page.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Tested-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In order to build the generic syscall table, we need a declaration for
every system call. sys_pipe2 was added without a proper declaration, so
add this to syscalls.h now.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
Revert driver core: move platform_data into platform_device
Revert driver core: fix passing platform_data
Remove old PRINTK_DEBUG config item
Doc/sysfs-rules: Swap the order of the words so the sentence makes more sense
Driver core: platform: fix kernel-doc warnings
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (22 commits)
Fix the race between capifs remount and node creation
Fix races around the access to ->s_options
switch ufs directories to ufs_sync_file()
Switch open_exec() and sys_uselib() to do_open_filp()
Make open_exec() and sys_uselib() use may_open(), instead of duplicating its parts
Reduce path_lookup() abuses
Make checkpatch.pl shut up on fs/inode.c
NULL noise in fs/super.c:kill_bdev_super()
romfs: cleanup romfs_fs.h
ROMFS: romfs_dev_read() error ignored
fs: dcache fix LRU ordering
ocfs2: Use nd_set_link().
Fix deadlock in ipathfs ->get_sb()
Fix a leak in failure exit in 9p ->get_sb()
Convert obvious places to deactivate_locked_super()
New helper: deactivate_locked_super()
reiserfs: remove privroot hiding in lookup
reiserfs: dont associate security.* with xattr files
reiserfs: fixup xattr_root caching
Always lookup priv_root on reiserfs mount and keep it
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits)
bonding: fix panic if initialization fails
IXP4xx: complete Ethernet netdev setup before calling register_netdev().
IXP4xx: use "ENODEV" instead of "ENOSYS" in module initialization.
ipvs: Fix IPv4 FWMARK virtual services
ipv4: Make INET_LRO a bool instead of tristate.
net: remove stale reference to fastroute from Kconfig help text
net: update skb_recycle_check() for hardware timestamping changes
bnx2: Fix panic in bnx2_poll_work().
net-sched: fix bfifo default limit
igb: resolve panic on shutdown when SR-IOV is enabled
wimax: oops: wimax_dev_add() is the only one that can initialize the state
wimax: fix oops if netlink fails to add attribute
Bluetooth: Move dev_set_name() to a context that can sleep
netfilter: ctnetlink: fix wrong message type in user updates
netfilter: xt_cluster: fix use of cluster match with 32 nodes
netfilter: ip6t_ipv6header: fix match on packets ending with NEXTHDR_NONE
netfilter: add missing linux/types.h include to xt_LED.h
mac80211: pid, fix memory corruption
mac80211: minstrel, fix memory corruption
cfg80211: fix comment on regulatory hint processing
...
Put generic_show_options read access to s_options under rcu_read_lock,
split save_mount_options() into "we are setting it the first time"
(uses in foo_fill_super()) and "we are relacing and freeing the old one",
synchronize_rcu() before kfree() in the latter.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>