linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-27 21:15:04 +07:00

Author	SHA1	Message	Date
Joerg Roedel	c5a78f2b64	KVM: MMU: Make tdp_enabled a mmu-context parameter This patch changes the tdp_enabled flag from its global meaning to the mmu-context and renames it to direct_map there. This is necessary for Nested SVM with emulation of Nested Paging where we need an extra MMU context to shadow the Nested Nested Page Table. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:28 +02:00
Jes Sorensen	b9a52c4b78	x86: Define MSR_EBC_FREQUENCY_ID Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:26 +02:00
Gleb Natapov	d2ddd1c483	KVM: x86 emulator: get rid of "restart" in emulation context. x86_emulate_insn() will return 1 if instruction can be restarted without re-entering a guest. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:34 +02:00
Zachary Amsden	1d5f066e0b	KVM: x86: Fix a possible backwards warp of kvmclock Kernel time, which advances in discrete steps may progress much slower than TSC. As a result, when kvmclock is adjusted to a new base, the apparent time to the guest, which runs at a much higher, nsec scaled rate based on the current TSC, may have already been observed to have a larger value (kernel_ns + scaled tsc) than the value to which we are setting it (kernel_ns + 0). We must instead compute the clock as potentially observed by the guest for kernel_ns to make sure it does not go backwards. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:24 +02:00
Zachary Amsden	347bb4448c	x86: pvclock: Move scale_delta into common header The scale_delta function for shift / multiply with 31-bit precision moves to a common header so it can be used by both kernel and kvm module. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:24 +02:00
Zachary Amsden	e48672fa25	KVM: x86: Unify TSC logic Move the TSC control logic from the vendor backends into x86.c by adding adjust_tsc_offset to x86 ops. Now all TSC decisions can be done in one place. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:23 +02:00
Zachary Amsden	f38e098ff3	KVM: x86: TSC reset compensation Attempt to synchronize TSCs which are reset to the same value. In the case of a reliable hardware TSC, we can just re-use the same offset, but on non-reliable hardware, we can get closer by adjusting the offset to match the elapsed time. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:22 +02:00
Zachary Amsden	99e3e30aee	KVM: x86: Move TSC offset writes to common code Also, ensure that the storing of the offset and the reading of the TSC are never preempted by taking a spinlock. While the lock is overkill now, it is useful later in this patch series. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:22 +02:00
Zachary Amsden	ae38436b78	KVM: x86: Drop vm_init_tsc This is used only by the VMX code, and is not done properly; if the TSC is indeed backwards, it is out of sync, and will need proper handling in the logic at each and every CPU change. For now, drop this test during init as misguided. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:21 +02:00
Dave Hansen	49d5ca2663	KVM: replace x86 kvm n_free_mmu_pages with n_used_mmu_pages Doing this makes the code much more readable. That's borne out by the fact that this patch removes code. "used" also happens to be the number that we need to return back to the slab code when our shrinker gets called. Keeping this value as opposed to free makes the next patch simpler. So, 'struct kvm' is kzalloc()'d. 'struct kvm_arch' is a structure member (and not a pointer) of 'struct kvm'. That means they start out zeroed. I _think_ they get initialized properly by kvm_mmu_change_mmu_pages(). But, that only happens via kvm ioctls. Another benefit of storing 'used' intead of 'free' is that the values are consistent from the moment the structure is allocated: no negative "used" value. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:18 +02:00
Dave Hansen	39de71ec53	KVM: rename x86 kvm->arch.n_alloc_mmu_pages arch.n_alloc_mmu_pages is a poor choice of name. This value truly means, "the number of pages which _may_ be allocated". But, reading the name, "n_alloc_mmu_pages" implies "the number of allocated mmu pages", which is dead wrong. It's really the high watermark, so let's give it a name to match: nr_max_mmu_pages. This change will make the next few patches much more obvious and easy to read. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:18 +02:00
Mohammed Gamal	160ce1f1a8	KVM: x86 emulator: Allow accessing IDT via emulator ops The patch adds a new member get_idt() to x86_emulate_ops. It also adds a function to get the idt in order to be used by the emulator. This is needed for real mode interrupt injection and the emulation of int instructions. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:59 +02:00
Alexander Graf	ba49296236	KVM: Move kvm_guest_init out of generic code Currently x86 is the only architecture that uses kvm_guest_init(). With PowerPC we're getting a second user, but the signature is different there and we don't need to export it, as it uses the normal kernel init framework. So let's move the x86 specific definition of that function over to the x86 specfic header file. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:49 +02:00
Avi Kivity	2dbd0dd711	KVM: x86 emulator: Decode memory operands directly into a 'struct operand' Since modrm operand can be either register or memory, decoding it into a 'struct operand', which can represent both, is simpler. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:40 +02:00
Avi Kivity	d4709c78ee	KVM: x86 emulator: drop use_modrm_ea Unused (and has never been). Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:34 +02:00
Avi Kivity	1a6440aef6	KVM: x86 emulator: use correct type for memory address in operands Currently we use a void pointer for memory addresses. That's wrong since these are guest virtual addresses which are not directly dereferencable by the host. Use the correct type, unsigned long. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:33 +02:00
Avi Kivity	09ee57cdae	KVM: x86 emulator: push segment override out of decode_modrm() Let it compute modrm_seg instead, and have the caller apply it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:33 +02:00
Gleb Natapov	4fc40f076f	KVM: x86 emulator: check io permissions only once for string pio Do not recheck io permission on every iteration. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:29 +02:00
Avi Kivity	ef65c88912	KVM: x86 emulator: allow storing emulator execution function in decode tables Instead of looking up the opcode twice (once for decode flags, once for the big execution switch) look up both flags and function in the decode tables. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:22 +02:00
Avi Kivity	9aabc88fc8	KVM: x86 emulator: store x86_emulate_ops in emulation context It doesn't ever change, so we don't need to pass it around everywhere. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:21 +02:00
Avi Kivity	9581d442b9	KVM: Fix fs/gs reload oops with invalid ldt kvm reloads the host's fs and gs blindly, however the underlying segment descriptors may be invalid due to the user modifying the ldt after loading them. Fix by using the safe accessors (loadsegment() and load_gs_index()) instead of home grown unsafe versions. This is CVE-2010-3698. KVM-Stable-Tag. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-19 14:21:45 -02:00
Linus Torvalds	050026feae	Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Avoid 'constant_test_bit()' misoptimization due to cast to non-volatile	2010-09-27 21:19:27 -07:00
Linus Torvalds	6a6aa2b7e4	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/amd-iommu: Fix rounding-bug in __unmap_single x86/amd-iommu: Work around S3 BIOS bug x86/amd-iommu: Set iommu configuration flags in enable-loop x86, setup: Fix earlyprintk=serial,0x3f8,115200 x86, setup: Fix earlyprintk=serial,ttyS0,115200	2010-09-27 12:22:21 -07:00
Alexander Chumachenko	c9e2fbd909	x86: Avoid 'constant_test_bit()' misoptimization due to cast to non-volatile While debugging bit_spin_lock() hang, it was tracked down to gcc-4.4 misoptimization of non-inlined constant_test_bit() due to non-volatile addr when 'const volatile unsigned long addr' cast to 'unsigned long ' with subsequent unconditional jump to pause (and not to the test) leading to hang. Compiling with gcc-4.3 or disabling CONFIG_OPTIMIZE_INLINING yields inlined constant_test_bit() and correct jump, thus working around the kernel bug. Other arches than asm-x86 may implement this slightly differently; 2.6.29 mitigates the misoptimization by changing the function prototype (commit `c4295fbb60`) but probably fixing the issue itself is better. Signed-off-by: Alexander Chumachenko <ledest@gmail.com> Signed-off-by: Michael Shigorin <mike@osdn.org.ua> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-09-26 22:43:07 -07:00
Jan Beulich	a46590533a	x86/hwmon: fix initialization of coretemp Using cpuid_eax() to determine feature availability on other than the current CPU is invalid. And feature availability should also be checked in the hotplug code path. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Rudolf Marek <r.marek@assembler.cz> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>	2010-09-24 11:44:19 -07:00
Ingo Molnar	7329cf0201	Merge branch 'amd-iommu/2.6.36' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent	2010-09-24 11:19:53 +02:00
Joerg Roedel	4c894f47bb	x86/amd-iommu: Work around S3 BIOS bug This patch adds a workaround for an IOMMU BIOS problem to the AMD IOMMU driver. The result of the bug is that the IOMMU does not execute commands anymore when the system comes out of the S3 state resulting in system failure. The bug in the BIOS is that is does not restore certain hardware specific registers correctly. This workaround reads out the contents of these registers at boot time and restores them on resume from S3. The workaround is limited to the specific IOMMU chipset where this problem occurs. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2010-09-23 16:26:03 +02:00
Joerg Roedel	e9bf519711	x86/amd-iommu: Set iommu configuration flags in enable-loop This patch moves the setting of the configuration and feature flags out out the acpi table parsing path and moves it into the iommu-enable path. This is needed to reliably fix resume-from-s3. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2010-09-23 16:24:50 +02:00
Linus Torvalds	87ac6fa26e	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hw breakpoints: Fix pid namespace bug x86: Fix instruction breakpoint encoding oprofile: Add Support for Intel CPU Family 6 / Model 22 (Intel Celeron 540) kprobes: Fix Kconfig dependency	2010-09-21 13:21:42 -07:00
Linus Torvalds	a5b617368c	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: hpet: Work around hardware stupidity x86, build: Disable -fPIE when compiling with CONFIG_CC_STACKPROTECTOR=y x86, cpufeature: Suppress compiler warning with gcc 3.x x86, UV: Fix initialization of max_pnode	2010-09-16 19:38:08 -07:00
Frederic Weisbecker	89e45aac42	x86: Fix instruction breakpoint encoding Lengths and types of breakpoints are encoded in a half byte into CPU registers. However when we extract these values and store them, we add a high half byte part to them: 0x40 to the length and 0x80 to the type. When that gets reloaded to the CPU registers, the high part is masked. While making the instruction breakpoints available for perf, I zapped that high part on instruction breakpoint encoding and that broke the arch -> generic translation used by ptrace instruction breakpoints. Writing dr7 to set an inst breakpoint was then failing. There is no apparent reason for these high parts so we could get rid of them altogether. That's an invasive change though so let's do that later and for now fix the problem by restoring that inst breakpoint high part encoding in this sole patch. Reported-by: Kelvie Wong <kelvie@ieee.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Will Deacon <will.deacon@arm.com>	2010-09-17 03:24:13 +02:00
H. Peter Anvin	c41d68a513	compat: Make compat_alloc_user_space() incorporate the access_ok() compat_alloc_user_space() expects the caller to independently call access_ok() to verify the returned area. A missing call could introduce problems on some architectures. This patch incorporates the access_ok() check into compat_alloc_user_space() and also adds a sanity check on the length. The existing compat_alloc_user_space() implementations are renamed arch_compat_alloc_user_space() and are used as part of the implementation of the new global function. This patch assumes NULL will cause __get_user()/__put_user() to either fail or access userspace on all architectures. This should be followed by checking the return value of compat_access_user_space() for NULL in the callers, at which time the access_ok() in the callers can also be removed. Reported-by: Ben Hawkes <hawkes@sota.gen.nz> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: James Bottomley <jejb@parisc-linux.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: <stable@kernel.org>	2010-09-14 16:08:45 -07:00
Thomas Gleixner	54ff7e595d	x86: hpet: Work around hardware stupidity This more or less reverts commits `08be979` (x86: Force HPET readback_cmp for all ATI chipsets) and `30a564be` (x86, hpet: Restrict read back to affected ATI chipsets) to the status of commit `8da854c` (x86, hpet: Erratum workaround for read after write of HPET comparator). The delta to commit `8da854c` is mostly comments and the change from WARN_ONCE to printk_once as we know the call path of this function already. This needs really in depth explanation: First of all the HPET design is a complete failure. Having a counter compare register which generates an interrupt on matching values forces the software to do at least one superfluous readback of the counter register. While it is nice in theory to program "absolute" time events it is practically useless because the timer runs at some absurd frequency which can never be matched to real world units. So we are forced to calculate a relative delta and this forces a readout of the actual counter value, adding the delta and programming the compare register. When the delta is small enough we run into the danger that we program a compare value which is already in the past. Due to the compare for equal nature of HPET we need to read back the counter value after writing the compare rehgister (btw. this is necessary for absolute timeouts as well) to make sure that we did not miss the timer event. We try to work around that by setting the minimum delta to a value which is larger than the theoretical time which elapses between the counter readout and the compare register write, but that's only true in theory. A NMI or SMI which hits between the readout and the write can easily push us beyond that limit. This would result in waiting for the next HPET timer interrupt until the 32bit wraparound of the counter happens which takes about 306 seconds. So we designed the next event function to look like: match = read_cnt() + delta; write_compare_ref(match); return read_cnt() < match ? 0 : -ETIME; At some point we got into trouble with certain ATI chipsets. Even the above "safe" procedure failed. The reason was that the write to the compare register was delayed probably for performance reasons. The theory was that they wanted to avoid the synchronization of the write with the HPET clock, which is understandable. So the write does not hit the compare register directly instead it goes to some intermediate register which is copied to the real compare register in sync with the HPET clock. That opens another window for hitting the dreaded "wait for a wraparound" problem. To work around that "optimization" we added a read back of the compare register which either enforced the update of the just written value or just delayed the readout of the counter enough to avoid the issue. We unfortunately never got any affirmative info from ATI/AMD about this. One thing is sure, that we nuked the performance "optimization" that way completely and I'm pretty sure that the result is worse than before some HW folks came up with those. Just for paranoia reasons I added a check whether the read back compare register value was the same as the value we wrote right before. That paranoia check triggered a couple of years after it was added on an Intel ICH9 chipset. Venki added a workaround (commit `8da854c`) which was reading the compare register twice when the first check failed. We considered this to be a penalty in general and restricted the readback (thus the wasted CPU cycles) to the known to be affected ATI chipsets. This turned out to be a utterly wrong decision. 2.6.35 testers experienced massive problems and finally one of them bisected it down to commit `30a564be` which spured some further investigation. Finally we got confirmation that the write to the compare register can be delayed by up to two HPET clock cycles which explains the problems nicely. All we can do about this is to go back to Venki's initial workaround in a slightly modified version. Just for the record I need to say, that all of this could have been avoided if hardware designers and of course the HPET committee would have thought about the consequences for a split second. It's out of my comprehension why designing a working timer is so hard. There are two ways to achieve it: 1) Use a counter wrap around aware compare_reg <= counter_reg implementation instead of the easy compare_reg == counter_reg Downsides: - It needs more silicon. - It needs a readout of the counter to apply a relative timeout. This is necessary as the counter does not run in any useful (and adjustable) frequency and there is no guarantee that the counter which is used for timer events is the same which is used for reading the actual time (and therefor for calculating the delta) Upsides: - None 2) Use a simple down counter for relative timer events Downsides: - Absolute timeouts are not possible, which is not a problem at all in the context of an OS and the expected max. latencies/jitter (also see Downsides of #1) Upsides: - It needs less or equal silicon. - It works ALWAYS - It is way faster than a compare register based solution (One write versus one write plus at least one and up to four reads) I would not be so grumpy about all of this, if I would not have been ignored for many years when pointing out these flaws to various hardware folks. I really hate timers (at least those which seem to be designed by janitors). Though finally we got a reasonable explanation plus a solution and I want to thank all the folks involved in chasing it down and providing valuable input to this. Bisected-by: Nix <nix@esperi.org.uk> Reported-by: Artur Skawina <art.08.09@gmail.com> Reported-by: Damien Wyart <damien.wyart@free.fr> Reported-by: John Drescher <drescherjm@gmail.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: stable@kernel.org Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-09-15 00:55:13 +02:00
Tetsuo Handa	2fd818642a	x86, cpufeature: Suppress compiler warning with gcc 3.x Gcc 3.x generates a warning arch/x86/include/asm/cpufeature.h: In function `__static_cpu_has': arch/x86/include/asm/cpufeature.h:326: warning: asm operand 1 probably doesn't match constraints on each file. But static_cpu_has() for gcc 3.x does not need __static_cpu_has(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> LKML-Reference: <201008300127.o7U1RC6Z044051@www262.sakura.ne.jp> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-13 14:48:41 -07:00
Linus Torvalds	be6200aac9	Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Perform hardware_enable in CPU_STARTING callback KVM: i8259: fix migration KVM: fix i8259 oops when no vcpus are online KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts	2010-09-10 08:02:45 -07:00
Linus Torvalds	1faa6ec8cc	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mcheck: Avoid duplicate sysfs links/files for thresholding banks io-mapping: Fix the address space annotations x86: Fix the address space annotations of iomap_atomic_prot_pfn() x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline x86, hwmon: Fix unsafe smp_processor_id() in thermal_throttle_add_dev	2010-09-08 11:14:10 -07:00
Avi Kivity	16518d5ada	KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts operand::val and operand::orig_val are 32-bit on i386, whereas cmpxchg8b operands are 64-bit. Fix by adding val64 and orig_val64 union members to struct operand, and using them where needed. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-09-08 14:50:55 -03:00
Linus Torvalds	d56557af19	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: bus speed strings should be const PCI hotplug: Fix build with CONFIG_ACPI unset PCI: PCIe: Remove the port driver module exit routine PCI: PCIe: Move PCIe PME code to the pcie directory PCI: PCIe: Disable PCIe port services during port initialization PCI: PCIe: Ask BIOS for control of all native services at once ACPI/PCI: Negotiate _OSC control bits before requesting them ACPI/PCI: Do not preserve _OSC control bits returned by a query ACPI/PCI: Make acpi_pci_query_osc() return control bits ACPI/PCI: Reorder checks in acpi_pci_osc_control_set() PCI: PCIe: Introduce commad line switch for disabling port services PCI: PCIe AER: Introduce pci_aer_available() x86/PCI: only define pci_domain_nr if PCI and PCI_DOMAINS are set PCI: provide stub pci_domain_nr function for !CONFIG_PCI configs	2010-09-07 16:00:17 -07:00
Francisco Jerez	cc1a8e5233	x86: Fix the address space annotations of iomap_atomic_prot_pfn() This patch fixes the sparse warnings when the return pointer of iomap_atomic_prot_pfn() is used as an argument of iowrite32() and friends. Signed-off-by: Francisco Jerez <currojerez@riseup.net> LKML-Reference: <1283633804-11749-1-git-send-email-currojerez@riseup.net> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-05 14:26:14 +02:00
Linus Torvalds	5e686019df	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states sched: Fix rq->clock synchronization when migrating tasks	2010-08-25 08:40:56 -07:00
Linus Torvalds	36423a5ed5	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, apic: Fix apic=debug boot crash x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues x86-32: Fix dummy trampoline-related inline stubs x86-32: Separate 1:1 pagetables from swapper_pg_dir x86, cpu: Fix regression in AMD errata checking code	2010-08-20 14:25:08 -07:00
Suresh Siddha	cd7240c0b9	x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states TSC's get reset after suspend/resume (even on cpu's with invariant TSC which runs at a constant rate across ACPI P-, C- and T-states). And in some systems BIOS seem to reinit TSC to arbitrary large value (still sync'd across cpu's) during resume. This leads to a scenario of scheduler rq->clock (sched_clock_cpu()) less than rq->age_stamp (introduced in 2.6.32). This leads to a big value returned by scale_rt_power() and the resulting big group power set by the update_group_power() is causing improper load balancing between busy and idle cpu's after suspend/resume. This resulted in multi-threaded workloads (like kernel-compilation) go slower after suspend/resume cycle on core i5 laptops. Fix this by recomputing cyc2ns_offset's during resume, so that sched_clock() continues from the point where it was left off during suspend. Reported-by: Florian Pritz <flo@xssn.at> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: <stable@kernel.org> # [v2.6.32+] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1282262618.2675.24.camel@sbsiddha-MOBL3.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-08-20 14:59:02 +02:00
H. Peter Anvin	8848a91068	x86-32: Fix dummy trampoline-related inline stubs Fix dummy inline stubs for trampoline-related functions when no trampolines exist (until we get rid of the no-trampoline case entirely.) Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <4C6C294D.3030404@zytor.com>	2010-08-18 12:42:24 -07:00
Joerg Roedel	fd89a13792	x86-32: Separate 1:1 pagetables from swapper_pg_dir This patch fixes machine crashes which occur when heavily exercising the CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by AMD Erratum 383 and result in a fatal machine check exception. Here's the scenario: 1. On 32-bit, the swapper_pg_dir page table is used as the initial page table for booting a secondary CPU. 2. To make this work, swapper_pg_dir needs a direct mapping of physical memory in it (the low mappings). By adding those low, large page (2M) mappings (PAE kernel), we create the necessary conditions for Erratum 383 to occur. 3. Other CPUs which do not participate in the off- and onlining game may use swapper_pg_dir while the low mappings are present (when leave_mm is called). For all steps below, the CPU referred to is a CPU that is using swapper_pg_dir, and not the CPU which is being onlined. 4. The presence of the low mappings in swapper_pg_dir can result in TLB entries for addresses below __PAGE_OFFSET to be established speculatively. These TLB entries are marked global and large. 5. When the CPU with such TLB entry switches to another page table, this TLB entry remains because it is global. 6. The process then generates an access to an address covered by the above TLB entry but there is a permission mismatch - the TLB entry covers a large global page not accessible to userspace. 7. Due to this permission mismatch a new 4kb, user TLB entry gets established. Further, Erratum 383 provides for a small window of time where both TLB entries are present. This results in an uncorrectable machine check exception signalling a TLB multimatch which panics the machine. There are two ways to fix this issue: 1. Always do a global TLB flush when a new cr3 is loaded and the old page table was swapper_pg_dir. I consider this a hack hard to understand and with performance implications 2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit does. This patch implements solution 2. It introduces a trampoline_pg_dir which has the same layout as swapper_pg_dir with low_mappings. This page table is used as the initial page table of the booting CPU. Later in the bringup process, it switches to swapper_pg_dir and does a global TLB flush. This fixes the crashes in our test cases. -v2: switch to swapper_pg_dir right after entering start_secondary() so that we are able to access percpu data which might not be mapped in the trampoline page table. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> LKML-Reference: <20100816123833.GB28147@aftab> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-18 09:17:20 -07:00
David Howells	d7627467b7	Make do_execve() take a const filename pointer Make do_execve() take a const filename pointer so that kernel_execve() compiles correctly on ARM: arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type This also requires the argv and envp arguments to be consted twice, once for the pointer array and once for the strings the array points to. This is because do_execve() passes a pointer to the filename (now const) to copy_strings_kernel(). A simpler alternative would be to cast the filename pointer in do_execve() when it's passed to copy_strings_kernel(). do_execve() may not change any of the strings it is passed as part of the argv or envp lists as they are some of them in .rodata, so marking these strings as const should be fine. Further kernel_execve() and sys_execve() need to be changed to match. This has been test built on x86_64, frv, arm and mips. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-17 18:07:43 -07:00
Jesse Barnes	23b90cfd7b	x86/PCI: only define pci_domain_nr if PCI and PCI_DOMAINS are set Otherwise we'll duplicate definitions with the pci.h stubs. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-08-17 09:29:36 -07:00
Sam Ravnborg	bf56fba670	archs: replace unifdef-y with header-y unifdef-y and header-y have same semantic, so drop unifdef-y Signed-off-by: Sam Ravnborg <sam@ravnborg.org>	2010-08-14 22:26:51 +02:00
Linus Torvalds	c206d44ffd	Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, UV: Make kdump avoid stack dumps - fix !CONFIG_KEXEC breakage x86, UV: Initialize BAU hub map x86, UV: Make kdump avoid stack dumps	2010-08-13 18:00:25 -07:00
David Howells	c788732523	Mark arguments to certain syscalls as being const Mark arguments to certain system calls as being const where they should be but aren't. The list includes: () The filename arguments of various stat syscalls, execve(), various utimes syscalls and some mount syscalls. () The filename arguments of some syscall helpers relating to the above. (*) The buffer argument of various write syscalls. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-13 16:53:13 -07:00
Linus Torvalds	36450e9c95	Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, UV: Initialize BAU MMRs only on hubs with cpus x86, UV: Modularize BAU send and wait x86, UV: BAU broadcast to the local hub x86, UV: Correct BAU regular message type x86, UV: Remove BAU check for stay-busy x86, UV: Correct BAU discovery of hubs and sockets x86, UV: Correct BAU software acknowledge x86, UV: BAU structure rearranging x86, UV: Shorten access to BAU statistics structure x86, UV: Disable BAU on network congestion x86, UV: BAU tunables into a debugfs file x86, UV: Calculate BAU destination timeout	2010-08-13 10:38:37 -07:00

1 2 3 4 5 ...

2134 Commits