linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-11-30 17:36:41 +07:00

Author	SHA1	Message	Date
Nick Piggin	d00806b183	mm: fix fault vs invalidate race for linear mappings Fix the race between invalidate_inode_pages and do_no_page. Andrea Arcangeli identified a subtle race between invalidation of pages from pagecache with userspace mappings, and do_no_page. The issue is that invalidation has to shoot down all mappings to the page, before it can be discarded from the pagecache. Between shooting down ptes to a particular page, and actually dropping the struct page from the pagecache, do_no_page from any process might fault on that page and establish a new mapping to the page just before it gets discarded from the pagecache. The most common case where such invalidation is used is in file truncation. This case was catered for by doing a sort of open-coded seqlock between the file's i_size, and its truncate_count. Truncation will decrease i_size, then increment truncate_count before unmapping userspace pages; do_no_page will read truncate_count, then find the page if it is within i_size, and then check truncate_count under the page table lock and back out and retry if it had subsequently been changed (ptl will serialise against unmapping, and ensure a potentially updated truncate_count is actually visible). Complexity and documentation issues aside, the locking protocol fails in the case where we would like to invalidate pagecache inside i_size. do_no_page can come in anytime and filemap_nopage is not aware of the invalidation in progress (as it is when it is outside i_size). The end result is that dangling (->mapping == NULL) pages that appear to be from a particular file may be mapped into userspace with nonsense data. Valid mappings to the same place will see a different page. Andrea implemented two working fixes, one using a real seqlock, another using a page->flags bit. He also proposed using the page lock in do_no_page, but that was initially considered too heavyweight. However, it is not a global or per-file lock, and the page cacheline is modified in do_no_page to increment _count and _mapcount anyway, so a further modification should not be a large performance hit. Scalability is not an issue. This patch implements this latter approach. ->nopage implementations return with the page locked if it is possible for their underlying file to be invalidated (in that case, they must set a special vm_flags bit to indicate so). do_no_page only unlocks the page after setting up the mapping completely. invalidation is excluded because it holds the page lock during invalidation of each page (and ensures that the page is not mapped while holding the lock). This also allows significant simplifications in do_no_page, because we have the page locked in the right place in the pagecache from the start. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-19 10:04:41 -07:00
Linus Torvalds	ce524c8360	Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: eHEA: Fix bonding support Blackfin ethernet driver: on chip ethernet MAC controller driver fix wrong argument of tc35815_read_plat_dev_addr() ARM/ETHER3: Handle multicast frames. SAA9730: Handle multicast frames. NI5010: Handle multicast frames. NS83820: Handle multicast frames. Fix RGMII-ID handling in gianfar Fix Vitesse RGMII-ID support Add phy-connection-type to gianfar nodes Fix Vitesse 824x PHY interrupt acking [PATCH] zd1211rw: Add ID for Siemens Gigaset USB Stick 54 [PATCH] zd1211rw: Add ID for Planex GW-US54GXS [PATCH] Update version ipw2200 stamp to 1.2.2 [PATCH] ipw2200: Fix ipw_isr() comments error on shared IRQ [PATCH] Fix ipw2200 set wrong power parameter causing firmware error [PATCH] ipw2100: Fix `iwpriv set_power` error [PATCH] softmac: Channel is listed twice in scan output	2007-07-18 18:33:45 -07:00
Linus Torvalds	29e7ee378e	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: sysfs: cosmetic clean up on node creation failure paths sysfs: kill an extra put in sysfs_create_link() failure path Driver core: check return code of sysfs_create_link() HOWTO: Add the knwon_regression URI to the documentation dev_vdbg() documentation dev_vdbg(), available with -DVERBOSE_DEBUG sysfs: make sysfs_init_inode() static sysfs: fix sysfs root inode nlink accounting Documentation fix devres.txt: lib/iomap.c -> lib/devres.c sysfs: avoid kmem_cache_free(NULL) PM: remove deprecated dpm_runtime_* routines PM: Remove deprecated sysfs files Driver core: accept all valid action-strings in uevent-trigger debugfs: remove rmdir() non-empty complaint	2007-07-18 18:28:08 -07:00
Linus Torvalds	fc15bc817e	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/uio-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/uio-2.6: UIO: Hilscher CIF card driver UIO: Documentation UIO: Add the User IO core code	2007-07-18 18:27:50 -07:00
Linus Torvalds	a8dcf12f9e	Merge branch 'for-linus' of git://linux-nfs.org/~bfields/linux * 'for-linus' of git://linux-nfs.org/~bfields/linux: locks: fix vfs_test_lock() comment locks: make posix_test_lock() interface more consistent nfs: disable leases over NFS gfs2: stop giving out non-cluster-coherent leases locks: export setlease to filesystems locks: provide a file lease method enabling cluster-coherent leases locks: rename lease functions to reflect locks.c conventions locks: share more common lease code locks: clean up lease_alloc() locks: convert an -EINVAL return to a BUG leases: minor break_lease() comment clarification	2007-07-18 18:27:00 -07:00
J. Bruce Fields	6d34ac199a	locks: make posix_test_lock() interface more consistent Since posix_test_lock(), like fcntl() and ->lock(), indicates absence or presence of a conflict lock by setting fl_type to, respectively, F_UNLCK or something other than F_UNLCK, the return value is no longer needed. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:17:19 -04:00
J. Bruce Fields	4698afe8e3	locks: export setlease to filesystems Export setlease so it can used by filesystems to implement their lease methods. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:17:06 -04:00
J. Bruce Fields	f9ffed26d6	locks: provide a file lease method enabling cluster-coherent leases Currently leases are only kept locally, so there's no way for a distributed filesystem to enforce them against multiple clients. We're particularly interested in the case of nfsd exporting a cluster filesystem, in which case nfsd needs cluster-coherent leases in order to implement delegations correctly. Also add some documentation. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-07-18 19:14:47 -04:00
J. Bruce Fields	a9933cea7a	locks: rename lease functions to reflect locks.c conventions We've been using the convention that vfs_foo is the function that calls a filesystem-specific foo method if it exists, or falls back on a generic method if it doesn't; thus vfs_foo is what is called when some other part of the kernel (normally lockd or nfsd) wants to get a lock, whereas foo is what filesystems call to use the underlying local functionality as part of their lock implementation. So rename setlease to vfs_setlease (which will call a filesystem-specific setlease after a later patch) and __setlease to setlease. Also, vfs_setlease need only be GPL-exported as long as it's only needed by lockd and nfsd. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-07-18 19:14:12 -04:00
Hans J. Koch	beafc54c4e	UIO: Add the User IO core code This interface allows the ability to write the majority of a driver in userspace with only a very small shell of a driver in the kernel itself. It uses a char device and sysfs to interact with a userspace process to process interrupts and control memory accesses. See the docbook documentation for more details on how to use this interface. From: Hans J. Koch <hjk@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benedikt Spranger <b.spranger@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:57:15 -07:00
David Brownell	aebdc3b450	dev_vdbg(), available with -DVERBOSE_DEBUG This defines a dev_vdbg() call, which is enabled with -DVERBOSE_DEBUG. When enabled, dev_vdbg() acts just like dev_dbg(). When disabled, it is a NOP ... just like dev_dbg() without -DDEBUG. The specific code was moved out of a USB patch, but lots of drivers have similar support. That is, code can now be written to use an additional level of debug output, selected at compile time. Many driver authors have found this idiom to be very useful. A typical usage model is for "normal" debug messages to focus on fault paths and not be very "chatty", so that those messages can be left on during normal operation without much of a performance or syslog load. On the other hand "verbose" messages would be noisy enough that they wouldn't normally be enabled; they might even affect timings enough to change system or driver behavior. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:49:50 -07:00
Alan Stern	3f8df781fc	PM: remove deprecated dpm_runtime_* routines This patch (as933) removes the deprecated dpm_runtime_suspend() and dpm_runtime_resume() routines from the PM core. The only user of those routines is the PCMCIA ds driver; local replacements are added. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> CC: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:49:49 -07:00
Kay Sievers	60a96a5956	Driver core: accept all valid action-strings in uevent-trigger This allows the uevent file to handle any type of uevent action to be triggered by userspace instead of just the "add" uevent. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-07-18 15:49:49 -07:00
Andy Fleming	7132ab7f6e	Fix RGMII-ID handling in gianfar The TSEC/eTSEC can detect the interface to the PHY automatically, but it isn't able to detect whether the RGMII connection needs internal delay. So we need to detect that change in the device tree, propagate it to the platform data, and then check it if we're in RGMII. This fixes a bug on the 8641D HPCN board where the Vitesse PHY doesn't use the delay for RGMII. Signed-off-by: Andy Fleming <afleming@freescale.com>	2007-07-18 18:29:37 -04:00
Linus Torvalds	5bae7ac9fe	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6: [AVR32] Initialize phy_mask for both macb devices [AVR32] Fix atomic_add_unless() and atomic_sub_unless() [AVR32] Correct misspelled CONFIG_BLK_DEV_INITRD variable. [AVR32] Fix build error in parse_tag_rdimg() [AVR32] Don't wire up macb0 unless SW6 is in default position [AVR32] Wire up SSC platform device 0 as TX on ATSTK1000 board [AVR32] Add Atmel SSC driver platform device to AT32AP architecture [AVR32] Remove optimization of unaligned word loads [AVR32] Make STK1000 mux settings configurable [AVR32] CPU frequency scaling for AT32AP [AVR32] Split SM device into PM, RTC, WDT and EIC [AVR32] faster avr32 unaligned access	2007-07-18 12:57:52 -07:00
Haavard Skinnemoen	3da86ee4f1	[AVR32] Fix atomic_add_unless() and atomic_sub_unless() These functions depend on "result" being initalized to 0, but "result" is not included as an input constraint to the inline assembly block following its initialization, only as an output constraint. Thus gcc thinks it doesn't need to initialize it, so result ends up undefined if the "unless" condition is true. This fixes an oops in sunrpc where the faulty atomics caused rpciod_up() to not start the workqueue as it should. Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>	2007-07-18 20:47:04 +02:00
Hans-Christian Egtvedt	9cf6cf58d0	[AVR32] Add Atmel SSC driver platform device to AT32AP architecture This patch adds register definitions, clocks and IRQs to the platform devices. Signed-off-by: Hans-Christian Egtvedt <hcegtvedt@atmel.com> Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>	2007-07-18 20:45:52 +02:00
Haavard Skinnemoen	e122eaf694	[AVR32] Remove optimization of unaligned word loads If we let unaligned word loads bypass the generic unaligned handling, gcc may combine it with a swap.b instruction and turn it into a ldwsp instruction, which does not work with unaligned addresses. Revert the optimization to prevent the RNDIS driver from crashing. Hopefully we'll figure something out later (it may be better to do the optimization in gcc.) Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>	2007-07-18 20:45:51 +02:00
Haavard Skinnemoen	7a5b805907	[AVR32] Split SM device into PM, RTC, WDT and EIC Split the SM platform device into separate platform devices for PM, RTC, WDT and EIC. This is more correct according to the documentation and allows us to simplify the code a little. Also turn the EIC driver into a real platform driver. Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Acked-by: Hans-Christian Egtvedt <hcegtvedt@atmel.com>	2007-07-18 20:45:51 +02:00
David Brownell	c6083cd61b	[AVR32] faster avr32 unaligned access Use a more conventional implementation for unaligned access, and include an AT32AP-specific optimization: the CPU will handle unaligned words. The result is always faster and smaller for 8, 16, and 32 bit values. For 64 bit quantities, it's presumably larger. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>	2007-07-18 20:45:50 +02:00
Linus Torvalds	a267c0a887	Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb * 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (126 commits) V4L/DVB (5847): Clean up schedule_timeout calls in cpia2 and ivtv code V4L/DVB (5846): Clean up setting state and scheduling timeouts V4L/DVB (5844): ivtv: add high volume debugging flag V4L/DVB (5843): ivtv: fix missing signal_pending check. V4L/DVB (5842): ivtv: Add locking to ensure stream setup is atomic. V4L/DVB (5841): tveeprom: add support for Philips FQ1216LME MK3 tuner. V4L/DVB (5840): fix dst and cx24123: tune() callback changed signess for delay V4L/DVB (5838): dvb-core: Fix signedness warnings (gcc 4.1.1, kernel 2.6.22) V4L/DVB (5837): stv0299: Fix signedness warning (gcc 4.1.1, kernel 2.6.22) V4L/DVB (5836): dvb-ttpci: re-initialize aspect ratio and pan scan after arm crash V4L/DVB (5835): saa7146/dvb-ttpci: Fix signedness warnings (gcc 4.1.1, kernel 2.6.22) V4L/DVB (5834): dvb-core: fix signedness warnings and const stripping V4L/DVB (5832): ir-common: optimize bit extract function V4L/DVB (5831): stradis: use ARRAY_SIZE V4L/DVB (5829): Firmware extract and loading for opera dvb-usb update V4L/DVB (5828): Kconfig: Added GemTek USB radio and removed experimental dependency. V4L/DVB (5826): Usbvision: video mux cleanup V4L/DVB (5825): Alter the tuner type for the WinTV USB UK PAL model. V4L/DVB (5824): Usbvision: Hauppauge WinTV USB SECAM_L fix V4L/DVB (5821): Saa7134: add remote control support for LifeView FlyDVB-S LR300 ...	2007-07-18 11:25:58 -07:00
Linus Torvalds	d756d10e24	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: extent macros cleanup Fix compilation with EXT_DEBUG, also fix leXX_to_cpu conversions. ext4: remove extra IS_RDONLY() check ext4: Use is_power_of_2() Use zero_user_page() in ext4 where possible ext4: Remove 65000 subdirectory limit ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields ext4: Add nanosecond timestamps jbd2: Move jbd2-debug file to debugfs jbd2: Fix CONFIG_JBD_DEBUG ifdef to be CONFIG_JBD2_DEBUG ext4: Set the journal JBD2_FEATURE_INCOMPAT_64BIT on large devices ext4: Make extents code sanely handle on-disk corruption ext4: copy i_flags to inode flags on write ext4: Enable extents by default Change on-disk format to support 2^15 uninitialized extents write support for preallocated blocks fallocate support in ext4 sys_fallocate() implementation on i386, x86_64 and powerpc	2007-07-18 10:32:00 -07:00
Linus Torvalds	cdf4a6482d	Merge branch 'upstream' of git://git.infradead.org/~dedekind/ubi-2.6 * 'upstream' of git://git.infradead.org/~dedekind/ubi-2.6: (28 commits) UBI: fix compile warning UBI: fix error handling in erase worker UBI: fix comments UBI: remove unneeded error checks UBI: cleanup usage of try_module_get UBI: fix overflow bug UBI: bugfix in max_sqnum calculation UBI: bugfix in sqnum calculation UBI: fix signed-unsigned multiplication UBI: fix bug in atomic_leb_change() UBI: fix message UBI: fix debugging stuff UBI: bugfix in error path UBI: use is_power_of_2() UBI: fix freeing ubi->vtbl while unloading UBI: fix MAINTAINERS UBI: bugfix in ubi_leb_change() UBI: kill homegrown endian macros UBI: cleanup ioctl handling UBI: error path bugfix ...	2007-07-18 10:27:24 -07:00
Oliver Endriss	804b445894	V4L/DVB (5835): saa7146/dvb-ttpci: Fix signedness warnings (gcc 4.1.1, kernel 2.6.22) Fix signedness warnings (gcc 4.1.1, kernel 2.6.22). Signed-off-by: Oliver Endriss <o.endriss@gmx.de> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2007-07-18 14:24:44 -03:00
Linus Torvalds	485cf925d8	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (24 commits) [NETFILTER]: xt_connlimit needs to depend on nf_conntrack [NETFILTER]: ipt_iprange.h must #include <linux/types.h> [IrDA]: Fix IrDA build failure [ATM]: nicstar needs virt_to_bus [NET]: move __dev_addr_discard adjacent to dev_addr_discard for readability [NET]: merge dev_unicast_discard and dev_mc_discard into one [NET]: move dev_mc_discard from dev_mcast.c to dev.c [NETLINK]: negative groups in netlink_setsockopt [PPPOL2TP]: Reset meta-data in xmit function [PPPOL2TP]: Fix use-after-free [PKT_SCHED]: Some typo fixes in net/sched/Kconfig [XFRM]: Fix crash introduced by struct dst_entry reordering [TCP]: remove unused argument to cong_avoid op [ATM]: [idt77252] Rename CONFIG_ATM_IDT77252_SEND_IDLE to not resemble a Kconfig variable [ATM]: [drivers] ioremap balanced with iounmap [ATM]: [lanai] sram_test_word() must be __devinit [ATM]: [nicstar] Replace C code with call to ARRAY_SIZE() macro. [ATM]: Eliminate dead config variable CONFIG_BR2684_FAST_TRANS. [ATM]: Replacing kmalloc/memset combination with kzalloc. [NET]: gen_estimator deadlock fix ...	2007-07-18 10:24:36 -07:00
Michael Krufky	8218b0b2ca	V4L/DVB (5793): Tuner: remove hardware-specific info from public header Move internal structures and debug macros to drivers/media/video/tuner-driver.h Signed-off-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2007-07-18 14:24:23 -03:00
Michael Krufky	7a91a80a0d	V4L/DVB (5753): Tuner: create struct tuner_operations Move tuner callback function pointers out of struct tuner, into struct tuner_operations. Signed-off-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2007-07-18 14:24:02 -03:00
Michael Krufky	be2b85a135	V4L/DVB (5741): Tuner: add release callback Individual tuner drivers are now allocating memory themselves for their own private data structures. This changeset adds a release callback to the tuner operations, so that newer drivers that may require more complex data structures may release this private data themselves. Signed-off-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2007-07-18 14:23:54 -03:00
Michael Krufky	b208319993	V4L/DVB (5719): Tuner: Move device-specific private data out of tuner struct Create private data struct for device specific private data. Signed-off-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2007-07-18 14:23:48 -03:00
Linus Torvalds	31bdc5dc76	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [SPARC64]: Set vio->desc_buf to NULL after freeing. [SPARC]: Mark sparc and sparc64 as not having virt_to_bus [SPARC64]: Fix reset handling in VNET driver. [SPARC64]: Handle reset events in vio_link_state_change(). [SPARC64]: Handle LDC resets properly in domain-services driver. [SPARC64]: Massively simplify VIO device layer and support hot add/remove. [SPARC64]: Simplify VNET probing. [SPARC64]: Simplify VDC device probing. [SPARC64]: Add basic infrastructure for MD add/remove notification.	2007-07-18 10:23:37 -07:00
Mauro Carvalho Chehab	8573a9e6a8	V4L/DVB (5563a): Add experimental support for tea5761 tuner This driver were made based on tea5761 specs. Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2007-07-18 14:23:11 -03:00
Jeremy Fitzhardinge	60223a326f	xen: Place vcpu_info structure into per-cpu memory An experimental patch for Xen allows guests to place their vcpu_info structs anywhere. We try to use this to place the vcpu_info into the PDA, which allows direct access. If this works, then switch to using direct access operations for irq_enable, disable, save_fl and restore_fl. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Keir Fraser <keir@xensource.com>	2007-07-18 08:47:45 -07:00
Jeremy Fitzhardinge	9f27ee5950	xen: add virtual block device driver. The block device frontend driver allows the kernel to access block devices exported exported by a virtual machine containing a physical block device driver. Signed-off-by: Ian Pratt <ian.pratt@xensource.com> Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Greg KH <greg@kroah.com> Cc: Jens Axboe <axboe@kernel.dk>	2007-07-18 08:47:45 -07:00
Jeremy Fitzhardinge	4bac07c993	xen: add the Xenbus sysfs and virtual device hotplug driver This communicates with the machine control software via a registry residing in a controlling virtual machine. This allows dynamic creation, destruction and modification of virtual device configurations (network devices, block devices and CPUS, to name some examples). [ Greg, would you mind giving this a review? Thanks -J ] Signed-off-by: Ian Pratt <ian.pratt@xensource.com> Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Greg KH <greg@kroah.com>	2007-07-18 08:47:45 -07:00
Jeremy Fitzhardinge	ad9a86121f	xen: Add grant table support Add Xen 'grant table' driver which allows granting of access to selected local memory pages by other virtual machines and, symmetrically, the mapping of remote memory pages which other virtual machines have granted access to. This driver is a prerequisite for many of the Xen virtual device drivers, which grant the 'device driver domain' restricted and temporary access to only those memory pages that are currently involved in I/O operations. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Ian Pratt <ian.pratt@xensource.com> Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Signed-off-by: Chris Wright <chrisw@sous-sol.org>	2007-07-18 08:47:44 -07:00
Jeremy Fitzhardinge	b536b4b962	xen: use the hvc console infrastructure for Xen console Implement a Xen back-end for hvc console. * * * Add early printk support via hvc console, enable using "earlyprintk=xen" on the kernel command line. From: Gerd Hoffmann <kraxel@suse.de> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Olof Johansson <olof@lixom.net>	2007-07-18 08:47:44 -07:00
Jeremy Fitzhardinge	f87e4cac4f	xen: SMP guest support This is a fairly straightforward Xen implementation of smp_ops. Xen has its own IPI mechanisms, and has no dependency on any APIC-based IPI. The smp_ops hooks and the flush_tlb_others pv_op allow a Xen guest to avoid all APIC code in arch/i386 (the only apic operation is a single apic_read for the apic version number). One subtle point which needs to be addressed is unpinning pagetables when another cpu may have a lazy tlb reference to the pagetable. Xen will not allow an in-use pagetable to be unpinned, so we must find any other cpus with a reference to the pagetable and get them to shoot down their references. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Andi Kleen <ak@suse.de>	2007-07-18 08:47:44 -07:00
Jeremy Fitzhardinge	c85b04c374	xen: add pinned page flag Add a new definition for PG_owner_priv_1 to define PG_pinned on Xen pagetable pages. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>	2007-07-18 08:47:43 -07:00
Jeremy Fitzhardinge	e46cdb66c8	xen: event channels Xen implements interrupts in terms of event channels. Each guest domain gets 1024 event channels which can be used for a variety of purposes, such as Xen timer events, inter-domain events, inter-processor events (IPI) or for real hardware IRQs. Within the kernel, we map the event channels to IRQs, and implement the whole interrupt handling using a Xen irq_chip. Rather than setting NR_IRQ to 1024 under PARAVIRT in order to accomodate Xen, we create a dynamic mapping between event channels and IRQs. Ideally, Linux will eventually move towards dynamically allocating per-irq structures, and we can use a 1:1 mapping between event channels and irqs. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric W. Biederman <ebiederm@xmission.com>	2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge	5ead97c84f	xen: Core Xen implementation This patch is a rollup of all the core pieces of the Xen implementation, including: - booting and setup - pagetable setup - privileged instructions - segmentation - interrupt flags - upcalls - multicall batching BOOTING AND SETUP The vmlinux image is decorated with ELF notes which tell the Xen domain builder what the kernel's requirements are; the domain builder then constructs the address space accordingly and starts the kernel. Xen has its own entrypoint for the kernel (contained in an ELF note). The ELF notes are set up by xen-head.S, which is included into head.S. In principle it could be linked separately, but it seems to provoke lots of binutils bugs. Because the domain builder starts the kernel in a fairly sane state (32-bit protected mode, paging enabled, flat segments set up), there's not a lot of setup needed before starting the kernel proper. The main steps are: 1. Install the Xen paravirt_ops, which is simply a matter of a structure assignment. 2. Set init_mm to use the Xen-supplied pagetables (analogous to the head.S generated pagetables in a native boot). 3. Reserve address space for Xen, since it takes a chunk at the top of the address space for its own use. 4. Call start_kernel() PAGETABLE SETUP Once we hit the main kernel boot sequence, it will end up calling back via paravirt_ops to set up various pieces of Xen specific state. One of the critical things which requires a bit of extra care is the construction of the initial init_mm pagetable. Because Xen places tight constraints on pagetables (an active pagetable must always be valid, and must always be mapped read-only to the guest domain), we need to be careful when constructing the new pagetable to keep these constraints in mind. It turns out that the easiest way to do this is use the initial Xen-provided pagetable as a template, and then just insert new mappings for memory where a mapping doesn't already exist. This means that during pagetable setup, it uses a special version of xen_set_pte which ignores any attempt to remap a read-only page as read-write (since Xen will map its own initial pagetable as RO), but lets other changes to the ptes happen, so that things like NX are set properly. PRIVILEGED INSTRUCTIONS AND SEGMENTATION When the kernel runs under Xen, it runs in ring 1 rather than ring 0. This means that it is more privileged than user-mode in ring 3, but it still can't run privileged instructions directly. Non-performance critical instructions are dealt with by taking a privilege exception and trapping into the hypervisor and emulating the instruction, but more performance-critical instructions have their own specific paravirt_ops. In many cases we can avoid having to do any hypercalls for these instructions, or the Xen implementation is quite different from the normal native version. The privileged instructions fall into the broad classes of: Segmentation: setting up the GDT and the GDT entries, LDT, TLS and so on. Xen doesn't allow the GDT to be directly modified; all GDT updates are done via hypercalls where the new entries can be validated. This is important because Xen uses segment limits to prevent the guest kernel from damaging the hypervisor itself. Traps and exceptions: Xen uses a special format for trap entrypoints, so when the kernel wants to set an IDT entry, it needs to be converted to the form Xen expects. Xen sets int 0x80 up specially so that the trap goes straight from userspace into the guest kernel without going via the hypervisor. sysenter isn't supported. Kernel stack: The esp0 entry is extracted from the tss and provided to Xen. TLB operations: the various TLB calls are mapped into corresponding Xen hypercalls. Control registers: all the control registers are privileged. The most important is cr3, which points to the base of the current pagetable, and we handle it specially. Another instruction we treat specially is CPUID, even though its not privileged. We want to control what CPU features are visible to the rest of the kernel, and so CPUID ends up going into a paravirt_op. Xen implements this mainly to disable the ACPI and APIC subsystems. INTERRUPT FLAGS Xen maintains its own separate flag for masking events, which is contained within the per-cpu vcpu_info structure. Because the guest kernel runs in ring 1 and not 0, the IF flag in EFLAGS is completely ignored (and must be, because even if a guest domain disables interrupts for itself, it can't disable them overall). (A note on terminology: "events" and interrupts are effectively synonymous. However, rather than using an "enable flag", Xen uses a "mask flag", which blocks event delivery when it is non-zero.) There are paravirt_ops for each of cli/sti/save_fl/restore_fl, which are implemented to manage the Xen event mask state. The only thing worth noting is that when events are unmasked, we need to explicitly see if there's a pending event and call into the hypervisor to make sure it gets delivered. UPCALLS Xen needs a couple of upcall (or callback) functions to be implemented by each guest. One is the event upcalls, which is how events (interrupts, effectively) are delivered to the guests. The other is the failsafe callback, which is used to report errors in either reloading a segment register, or caused by iret. These are implemented in i386/kernel/entry.S so they can jump into the normal iret_exc path when necessary. MULTICALL BATCHING Xen provides a multicall mechanism, which allows multiple hypercalls to be issued at once in order to mitigate the cost of trapping into the hypervisor. This is particularly useful for context switches, since the 4-5 hypercalls they would normally need (reload cr3, update TLS, maybe update LDT) can be reduced to one. This patch implements a generic batching mechanism for hypercalls, which gets used in many places in the Xen code. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Ian Pratt <ian.pratt@xensource.com> Cc: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Cc: Adrian Bunk <bunk@stusta.de>	2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge	a42089dd35	xen: Add Xen interface header files Add Xen interface header files. These are taken fairly directly from the Xen tree, but somewhat rearranged to suit the kernel's conventions. Define macros and inline functions for doing hypercalls into the hypervisor. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Ian Pratt <ian.pratt@xensource.com> Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Signed-off-by: Chris Wright <chrisw@sous-sol.org>	2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge	688340ea34	Add a sched_clock paravirt_op The tsc-based get_scheduled_cycles interface is not a good match for Xen's runstate accounting, which reports everything in nanoseconds. This patch replaces this interface with a sched_clock interface, which matches both Xen and VMI's requirements. In order to do this, we: 1. replace get_scheduled_cycles with sched_clock 2. hoist cycles_2_ns into a common header 3. update vmi accordingly One thing to note: because sched_clock is implemented as a weak function in kernel/sched.c, we must define a real function in order to override this weak binding. This means the usual paravirt_ops technique of using an inline function won't work in this case. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Zachary Amsden <zach@vmware.com> Cc: Dan Hecht <dhecht@vmware.com> Cc: john stultz <johnstul@us.ibm.com>	2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge	d572929cdd	paravirt: helper to disable all IO space In a virtual environment, device drivers such as legacy IDE will waste quite a lot of time probing for their devices which will never appear. This helper function allows a paravirt implementation to lay claim to the whole iomem and ioport space, thereby disabling all device drivers trying to claim IO resources. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Rusty Russell <rusty@rustcorp.com.au>	2007-07-18 08:47:42 -07:00
Jeremy Fitzhardinge	5f4352fbff	Allocate and free vmalloc areas Allocate/release a chunk of vmalloc address space: alloc_vm_area reserves a chunk of address space, and makes sure all the pagetables are constructed for that address range - but no pages. free_vm_area releases the address space range. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Ian Pratt <ian.pratt@xensource.com> Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: "Jan Beulich" <JBeulich@novell.com> Cc: "Andi Kleen" <ak@muc.de>	2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge	c70df74376	paravirt: make siblingmap functions visible Paravirt implementations need to set the sibling map on new cpus. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>	2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge	724faa89cc	paravirt: unstatic smp_store_cpu_info Paravirt implementations need to store cpu info when bringing up cpus. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>	2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge	5378701324	paravirt: unstatic leave_mm Make globally leave_mm visible, specifically so that Xen can use it to shoot-down lazy uses of cr3. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>	2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge	03f0c2f950	paravirt: increase IRQ limit When running with CONFIG_PARAVIRT, we may want lots of IRQs even if there's no IO APIC. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com>	2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge	6996d3b63f	paravirt: add a hook for once the allocator is ready Add a hook so that the paravirt backend knows when the allocator is ready. This is useful for the obvious reason that the allocator is available, but the other side-effect of having the bootmem allocator available is that each page now has an associated "struct page". Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>	2007-07-18 08:47:41 -07:00
Jeremy Fitzhardinge	fdb4c338c8	paravirt: add an "mm" argument to alloc_pt It's useful to know which mm is allocating a pagetable. Xen uses this to determine whether the pagetable being added to is pinned or not. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>	2007-07-18 08:47:40 -07:00

1 2 3 4 5 ...

14939 Commits