linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
Andy Lutomirski	fce8dc0642	x86-64: Wire up getcpu syscall getcpu is available as a vdso entry and an emulated vsyscall. Programs that for some reason don't want to use the vdso should still be able to call getcpu without relying on the slow emulated vsyscall. It costs almost nothing to expose it as a real syscall. We also need this for the following patch in vsyscall=native mode. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/6b19f55bdb06a0c32c2fa6dba9b6f222e1fde999.1312988155.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-08-10 19:26:46 -05:00
Mathias Krause	66be895158	crypto: sha1 - SSSE3 based SHA1 implementation for x86-64 This is an assembler implementation of the SHA1 algorithm using the Supplemental SSE3 (SSSE3) instructions or, when available, the Advanced Vector Extensions (AVX). Testing with the tcrypt module shows the raw hash performance is up to 2.3 times faster than the C implementation, using 8k data blocks on a Core 2 Duo T5500. For the smalest data set (16 byte) it is still 25% faster. Since this implementation uses SSE/YMM registers it cannot safely be used in every situation, e.g. while an IRQ interrupts a kernel thread. The implementation falls back to the generic SHA1 variant, if using the SSE/YMM registers is not possible. With this algorithm I was able to increase the throughput of a single IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using the SSSE3 variant -- a speedup of +34.8%. Saving and restoring SSE/YMM state might make the actual throughput fluctuate when there are FPU intensive userland applications running. For example, meassuring the performance using iperf2 directly on the machine under test gives wobbling numbers because iperf2 uses the FPU for each packet to check if the reporting interval has expired (in the above test I got min/max/avg: 402/484/464 MBit/s). Using this algorithm on a IPsec gateway gives much more reasonable and stable numbers, albeit not as high as in the directly connected case. Here is the result from an RFC 2544 test run with a EXFO Packet Blazer FTB-8510: frame size sha1-generic sha1-ssse3 delta 64 byte 37.5 MBit/s 37.5 MBit/s 0.0% 128 byte 56.3 MBit/s 62.5 MBit/s +11.0% 256 byte 87.5 MBit/s 100.0 MBit/s +14.3% 512 byte 131.3 MBit/s 150.0 MBit/s +14.2% 1024 byte 162.5 MBit/s 193.8 MBit/s +19.3% 1280 byte 175.0 MBit/s 212.5 MBit/s +21.4% 1420 byte 175.0 MBit/s 218.7 MBit/s +25.0% 1518 byte 150.0 MBit/s 181.2 MBit/s +20.8% The throughput for the largest frame size is lower than for the previous size because the IP packets need to be fragmented in this case to make there way through the IPsec tunnel. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Maxim Locktyukhin <maxim.locktyukhin@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2011-08-10 19:00:29 +08:00
Borislav Petkov	dfb09f9b7a	x86, amd: Avoid cache aliasing penalties on AMD family 15h This patch provides performance tuning for the "Bulldozer" CPU. With its shared instruction cache there is a chance of generating an excessive number of cache cross-invalidates when running specific workloads on the cores of a compute module. This excessive amount of cross-invalidations can be observed if cache lines backed by shared physical memory alias in bits [14:12] of their virtual addresses, as those bits are used for the index generation. This patch addresses the issue by clearing all the bits in the [14:12] slice of the file mapping's virtual address at generation time, thus forcing those bits the same for all mappings of a single shared library across processes and, in doing so, avoids instruction cache aliases. It also adds the command line option "align_va_addr=(32\|64\|on\|off)" with which virtual address alignment can be enabled for 32-bit or 64-bit x86 individually, or both, or be completely disabled. This change leaves virtual region address allocation on other families and/or vendors unaffected. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1312550110-24160-2-git-send-email-bp@amd64.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-08-05 12:26:44 -07:00
Andy Lutomirski	318f5a2a67	x86-64: Add user_64bit_mode paravirt op Three places in the kernel assume that the only long mode CPL 3 selector is __USER_CS. This is not true on Xen -- Xen's sysretq changes cs to the magic value 0xe033. Two of the places are corner cases, but as of "x86-64: Improve vsyscall emulation CS and RIP handling" (`c9712944b2`), vsyscalls will segfault if called with Xen's extra CS selector. This causes a panic when older init builds die. It seems impossible to make Xen use __USER_CS reliably without taking a performance hit on every system call, so this fixes the tests instead with a new paravirt op. It's a little ugly because ptrace.h can't include paravirt.h. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/f4fcb3947340d9e96ce1054a432f183f9da9db83.1312378163.git.luto@mit.edu Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-08-04 16:13:49 -07:00
H. Peter Anvin	17b0436077	Merge commit 'v3.0' into x86/vdso	2011-08-04 16:13:20 -07:00
Linus Torvalds	33f35f2a4e	x86: don't include xen/xen.h in <asm/io.h> unless XEN is enabled Dmitry Kasatkin reports: "kernel-devel package with kernel headers have no <include/xen> directory if XEN is disabled. Modules which inclide asm/io.h won't compile. XEN related content is behind the CONFIG_XEN flag in the io.h. And <xen/xen.h> should be also behind CONFIG_XEN flag." So move the include of <xen/xen.h> down into the section that is conditional on CONFIG_XEN. Reported-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-03 22:00:38 -10:00
Linus Torvalds	35e51fe82d	Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: cpuidle: stop depending on pm_idle x86 idle: move mwait_idle_with_hints() to where it is used cpuidle: replace xen access to x86 pm_idle and default_idle cpuidle: create bootparam "cpuidle.off=1" mrst_pmu: driver for Intel Moorestown Power Management Unit	2011-08-03 21:54:15 -10:00
Len Brown	4bfc8288bc	x86 idle: move mwait_idle_with_hints() to where it is used ...and make it static no functional change cc: x86@kernel.org Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2011-08-03 19:06:36 -04:00
H. Peter Anvin	49d859d78c	x86, random: Verify RDRAND functionality and allow it to be disabled If the CPU declares that RDRAND is available, go through a guranteed reseed sequence, and make sure that it is actually working (producing data.) If it does not, disable the CPU feature flag. Allow RDRAND to be disabled on the command line (as opposed to at compile time) for a user who has special requirements with regards to random numbers. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "Theodore Ts'o" <tytso@mit.edu>	2011-07-31 14:02:19 -07:00
H. Peter Anvin	628c6246d4	x86, random: Architectural inlines to get random integers with RDRAND Architectural inlines to get random ints and longs using the RDRAND instruction. Intel has introduced a new RDRAND instruction, a Digital Random Number Generator (DRNG), which is functionally an high bandwidth entropy source, cryptographic whitener, and integrity monitor all built into hardware. This enables RDRAND to be used directly, bypassing the kernel random number pool. For technical documentation, see: http://software.intel.com/en-us/articles/download-the-latest-bull-mountain-software-implementation-guide/ In this patch, this is only used for the nonblocking random number pool. RDRAND is a nonblocking source, similar to our /dev/urandom, and is therefore not a direct replacement for /dev/random. The architectural hooks presented in the previous patch only feed the kernel internal users, which only use the nonblocking pool, and so this is not a problem. Since this instruction is available in userspace, there is no reason to have a /dev/hw_rng device driver for the purpose of feeding rngd. This is especially so since RDRAND is a nonblocking source, and needs additional whitening and reduction (see the above technical documentation for details) in order to be of "pure entropy source" quality. The CONFIG_EXPERT compile-time option can be used to disable this use of RDRAND. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Originally-by: Fenghua Yu <fenghua.yu@intel.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "Theodore Ts'o" <tytso@mit.edu>	2011-07-31 13:59:29 -07:00
Arun Sharma	7847777a45	atomic: cleanup asm-generic atomic.h inclusion After changing all consumers of atomics to include <linux/atomic.h>, we ran into some compile time errors due to this dependency chain: linux/atomic.h -> asm/atomic.h -> asm-generic/atomic-long.h where atomic-long.h could use funcs defined later in linux/atomic.h without a prototype. This patches moves the code that includes asm-generic/atomic.h to linux/atomic.h. Archs that need <asm-generic/atomic64.h> need to select CONFIG_GENERIC_ATOMIC64 from now on (some of them used to include it unconditionally). Compile tested on i386 and x86_64 with allnoconfig. Signed-off-by: Arun Sharma <asharma@fb.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:47 -07:00
Arun Sharma	f24219b4e9	atomic: move atomic_add_unless to generic code This is in preparation for more generic atomic primitives based on __atomic_add_unless. Signed-off-by: Arun Sharma <asharma@fb.com> Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:47 -07:00
Arun Sharma	60063497a9	atomic: use <linux/atomic.h> This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:47 -07:00
Akinobu Mita	148817ba09	asm-generic: add another generic ext2 atomic bitops The majority of architectures implement ext2 atomic bitops as test_and_{set,clear}_bit() without spinlock. This adds this type of generic implementation in ext2-atomic-setbit.h and use it wherever possible. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Suggested-by: Andreas Dilger <adilger@dilger.ca> Suggested-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:46 -07:00
Mike Frysinger	0e9a6cb5e6	ptrace: unify show_regs() prototype [ poleg@redhat.com: no need to declare show_regs() in ptrace.h, sched.h does this ] Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:43 -07:00
Linus Torvalds	fa8f53ace4	Merge branch 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, olpc-xo15-sci: Enable EC wakeup capability x86, olpc: Fix dependency on POWER_SUPPLY x86, olpc: Add XO-1.5 SCI driver x86, olpc: Add XO-1 RTC driver x86, olpc-xo1-sci: Propagate power supply/battery events x86, olpc-xo1-sci: Add lid switch functionality x86, olpc-xo1-sci: Add GPE handler and ebook switch functionality x86, olpc: EC SCI wakeup mask functionality x86, olpc: Add XO-1 SCI driver and power button control x86, olpc: Add XO-1 suspend/resume support x86, olpc: Rename olpc-xo1 to olpc-xo1-pm x86, olpc: Move CS5536-related constants to cs5535.h x86, olpc: Add missing elements to device tree	2011-07-26 11:11:54 -07:00
Linus Torvalds	ff0c4ad2c3	Merge branch 'for-upstream' of git://openrisc.net/jonas/linux * 'for-upstream' of git://openrisc.net/jonas/linux: (24 commits) OpenRISC: Add MAINTAINERS entry OpenRISC: Miscellaneous OpenRISC: Library routines OpenRISC: Headers OpenRISC: Traps OpenRISC: Module support OpenRISC: GPIO OpenRISC: Scheduling/Process management OpenRISC: Idle/Power management OpenRISC: System calls OpenRISC: IRQ OpenRISC: Timekeeping OpenRISC: DMA OpenRISC: PTrace OpenRISC: Build infrastructure OpenRISC: Signal handling OpenRISC: Memory management OpenRISC: Device tree OpenRISC: Boot code iomap: make IOPORT/PCI mapping functions conditional ...	2011-07-24 09:55:18 -07:00
Linus Torvalds	5fabc487c9	Merge branch 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits) KVM: IOMMU: Disable device assignment without interrupt remapping KVM: MMU: trace mmio page fault KVM: MMU: mmio page fault support KVM: MMU: reorganize struct kvm_shadow_walk_iterator KVM: MMU: lockless walking shadow page table KVM: MMU: do not need atomicly to set/clear spte KVM: MMU: introduce the rules to modify shadow page table KVM: MMU: abstract some functions to handle fault pfn KVM: MMU: filter out the mmio pfn from the fault pfn KVM: MMU: remove bypass_guest_pf KVM: MMU: split kvm_mmu_free_page KVM: MMU: count used shadow pages on prepareing path KVM: MMU: rename 'pt_write' to 'emulate' KVM: MMU: cleanup for FNAME(fetch) KVM: MMU: optimize to handle dirty bit KVM: MMU: cache mmio info on page fault path KVM: x86: introduce vcpu_mmio_gva_to_gpa to cleanup the code KVM: MMU: do not update slot bitmap if spte is nonpresent KVM: MMU: fix walking shadow page table KVM guest: KVM Steal time registration ...	2011-07-24 09:07:03 -07:00
Linus Torvalds	c61264f98c	Merge branch 'upstream/xen-tracing2' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'upstream/xen-tracing2' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen/trace: use class for multicall trace xen/trace: convert mmu events to use DECLARE_EVENT_CLASS()/DEFINE_EVENT() xen/multicall: move *idx fields to start of mc_buffer xen/multicall: special-case singleton hypercalls xen/multicalls: add unlikely around slowpath in __xen_mc_entry() xen/multicalls: disable MC_DEBUG xen/mmu: tune pgtable alloc/release xen/mmu: use extend_args for more mmuext updates xen/trace: add tlb flush tracepoints xen/trace: add segment desc tracing xen/trace: add xen_pgd_(un)pin tracepoints xen/trace: add ptpage alloc/release tracepoints xen/trace: add mmu tracepoints xen/trace: add multicall tracing xen/trace: set up tracepoint skeleton xen/multicalls: remove debugfs stats trace/xen: add skeleton for Xen trace events	2011-07-24 09:06:47 -07:00
Xiao Guangrong	c2a2ac2b56	KVM: MMU: lockless walking shadow page table Use rcu to protect shadow pages table to be freed, so we can safely walk it, it should run fastly and is needed by mmio page fault Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-24 11:50:38 +03:00
Xiao Guangrong	c37079586f	KVM: MMU: remove bypass_guest_pf The idea is from Avi: \| Maybe it's time to kill off bypass_guest_pf=1. It's not as effective as \| it used to be, since unsync pages always use shadow_trap_nonpresent_pte, \| and since we convert between the two nonpresent_ptes during sync and unsync. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-24 11:50:33 +03:00
Xiao Guangrong	bebb106a5a	KVM: MMU: cache mmio info on page fault path If the page fault is caused by mmio, we can cache the mmio info, later, we do not need to walk guest page table and quickly know it is a mmio fault while we emulate the mmio instruction Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-24 11:50:26 +03:00
Glauber Costa	d910f5c106	KVM guest: KVM Steal time registration This patch implements the kvm bits of the steal time infrastructure. The most important part of it, is the steal time clock. It is an continuous clock that shows the accumulated amount of steal time since vcpu creation. It is supposed to survive cpu offlining/onlining. [marcelo: fix build with CONFIG_KVM_GUEST=n] Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Avi Kivity <avi@redhat.com> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-24 11:49:36 +03:00
Linus Torvalds	148a7b1725	Merge branch 'x86-atomic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-atomic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Add support for cmpxchg_double	2011-07-23 10:35:43 -07:00
Linus Torvalds	9d0715630e	Merge branch 'timers-clocksource-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-clocksource-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clocksource: apb: Share APB timer code with other platforms	2011-07-23 10:34:47 -07:00
Linus Torvalds	8e204874db	Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-64, vdso: Do not allocate memory for the vDSO clocksource: Change __ARCH_HAS_CLOCKSOURCE_DATA to a CONFIG option x86, vdso: Drop now wrong comment Document the vDSO and add a reference parser ia64: Replace clocksource.fsys_mmio with generic arch data x86-64: Move vread_tsc and vread_hpet into the vDSO clocksource: Replace vread with generic arch data x86-64: Add --no-undefined to vDSO build x86-64: Allow alternative patching in the vDSO x86: Make alternative instruction pointers relative x86-64: Improve vsyscall emulation CS and RIP handling x86-64: Emulate legacy vsyscalls x86-64: Fill unused parts of the vsyscall page with 0xcc x86-64: Remove vsyscall number 3 (venosys) x86-64: Map the HPET NX x86-64: Remove kernel.vsyscall64 sysctl x86-64: Give vvars their own page x86-64: Document some of entry_64.S x86-64: Fix alignment of jiffies variable	2011-07-22 17:05:15 -07:00
Linus Torvalds	3e0b8df79d	Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, UV: Correct UV2 BAU destination timeout x86, UV: Correct failed topology memory leak x86, UV: Remove cpumask_t from the stack x86, UV: Rename hubmask to pnmask x86, UV: Correct reset_with_ipi() x86, UV: Allow for non-consecutive sockets x86, UV: Inline header file functions x86, UV: Fix smp_processor_id() use in a preemptable region x66, UV: Enable 64-bit ACPI MFCG support for SGI UV2 platform x86, UV: Clean up uv_mmrs.h	2011-07-22 17:04:55 -07:00
Linus Torvalds	9e39264ed4	Merge branch 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, numa: Implement pfn -> nid mapping granularity check x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/	2011-07-22 17:04:18 -07:00
Linus Torvalds	7c6582b28a	Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mce: Use mce_sysdev_ prefix to group functions x86, mce: Use mce_chrdev_ prefix to group functions x86, mce: Cleanup mce_read() x86, mce: Cleanup mce_create()/remove_device() x86, mce: Check the result of ancient_init() x86, mce: Introduce mce_gather_info() x86, mce: Replace MCM_ with MCI_MISC_ x86, mce: Replace MCE_SELF_VECTOR by irq_work x86, mce, severity: Clean up trivial coding style problems x86, mce, severity: Cleanup severity table x86, mce, severity: Make formatting a bit more readable x86, mce, severity: Fix two severities table signatures	2011-07-22 17:03:40 -07:00
Linus Torvalds	35b004cce1	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, intel, power: Correct the MSR_IA32_ENERGY_PERF_BIAS message x86, msr: Fix typo in ENERGY_PERF_BIAS_POWERSAVE x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS	2011-07-22 17:02:54 -07:00
Linus Torvalds	eb47418dc5	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix write lock scalability 64-bit issue x86: Unify rwsem assembly implementation x86: Unify rwlock assembly implementation x86, asm: Fix binutils 2.16 issue with __USER32_CS x86, asm: Cleanup thunk_64.S x86, asm: Flip RESTORE_ARGS arguments logic x86, asm: Flip SAVE_ARGS arguments logic x86, asm: Thin down SAVE/RESTORE_* asm macros	2011-07-22 17:02:24 -07:00
Linus Torvalds	52de84f3f3	Merge branch 'timers-rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Serialize EFI time accesses on rtc_lock x86: Serialize SMP bootup CMOS accesses on rtc_lock rtc: stmp3xxx: Remove UIE handlers rtc: stmp3xxx: Get rid of mach-specific accessors rtc: stmp3xxx: Initialize drvdata before registering device rtc: stmp3xxx: Port stmp-functions to mxs-equivalents rtc: stmp3xxx: Restore register definitions rtc: vt8500: Use define instead of hardcoded value for status bit	2011-07-22 16:52:39 -07:00
Linus Torvalds	a99a7d1436	Merge branch 'timers-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: mips: Fix i8253 clockevent fallout i8253: Cleanup outb/inb magic arm: Footbridge: Use common i8253 clockevent mips: Use common i8253 clockevent x86: Use common i8253 clockevent i8253: Create common clockevent implementation i8253: Export i8253_lock unconditionally pcpskr: MIPS: Make config dependencies finer grained pcspkr: Cleanup Kconfig dependencies i8253: Move remaining content and delete asm/i8253.h i8253: Consolidate definitions of PIT_LATCH x86: i8253: Consolidate definitions of global_clock_event i8253: Alpha, PowerPC: Remove unused asm/8253pit.h alpha: i8253: Cleanup remaining users of i8253pit.h i8253: Remove I8253_LOCK config i8253: Make pcsp sound driver use the shared i8253_lock i8253: Make pcspkr input driver use the shared i8253_lock i8253: Consolidate all kernel definitions of i8253_lock i8253: Unify all kernel declarations of i8253_lock i8253: Create linux/i8253.h and use it in all 8253 related files	2011-07-22 16:51:56 -07:00
Linus Torvalds	4d4abdcb1d	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (123 commits) perf: Remove the nmi parameter from the oprofile_perf backend x86, perf: Make copy_from_user_nmi() a library function perf: Remove perf_event_attr::type check x86, perf: P4 PMU - Fix typos in comments and style cleanup perf tools: Make test use the preset debugfs path perf tools: Add automated tests for events parsing perf tools: De-opt the parse_events function perf script: Fix display of IP address for non-callchain path perf tools: Fix endian conversion reading event attr from file header perf tools: Add missing 'node' alias to the hw_cache[] array perf probe: Support adding probes on offline kernel modules perf probe: Add probed module in front of function perf probe: Introduce debuginfo to encapsulate dwarf information perf-probe: Move dwarf library routines to dwarf-aux.{c, h} perf probe: Remove redundant dwarf functions perf probe: Move strtailcmp to string.c perf probe: Rename DIE_FIND_CB_FOUND to DIE_FIND_CB_END tracing/kprobe: Update symbol reference when loading module tracing/kprobes: Support module init function probing kprobes: Return -ENOENT if probe point doesn't exist ...	2011-07-22 16:44:39 -07:00
Linus Torvalds	6d16d6d9bb	Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: iommu/core: Fix build with INTR_REMAP=y && CONFIG_DMAR=n iommu/amd: Don't use MSI address range for DMA addresses iommu/amd: Move missing parts to drivers/iommu iommu: Move iommu Kconfig entries to submenu x86/ia64: intel-iommu: move to drivers/iommu/ x86: amd_iommu: move to drivers/iommu/ msm: iommu: move to drivers/iommu/ drivers: iommu: move to a dedicated folder x86/amd-iommu: Store device alias as dev_data pointer x86/amd-iommu: Search for existind dev_data before allocting a new one x86/amd-iommu: Allow dev_data->alias to be NULL x86/amd-iommu: Use only dev_data in low-level domain attach/detach functions x86/amd-iommu: Use only dev_data for dte and iotlb flushing routines x86/amd-iommu: Store ATS state in dev_data x86/amd-iommu: Store devid in dev_data x86/amd-iommu: Introduce global dev_data_list x86/amd-iommu: Remove redundant device_flush_dte() calls iommu-api: Add missing header file Fix up trivial conflicts (independent additions close to each other) in drivers/Makefile and include/linux/pci.h	2011-07-22 16:39:42 -07:00
Linus Torvalds	17413f5acd	Merge branch 'for-3.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-3.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: Fixup __this_cpu_xchg* operations	2011-07-22 15:07:35 -07:00
Linus Torvalds	acb41c0f92	Merge branch 'of-pci' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'of-pci' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: pci/of: Consolidate pci_bus_to_OF_node() pci/of: Consolidate pci_device_to_OF_node() x86/devicetree: Use generic PCI <-> OF matching microblaze/pci: Move the remains of pci_32.c to pci-common.c microblaze/pci: Remove powermac originated cruft pci/of: Match PCI devices to OF nodes dynamically	2011-07-22 14:54:02 -07:00
Linus Torvalds	a7e1aabb28	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: lguest: Fix in/out emulation lguest: Fix translation count about wikipedia's cpuid page lguest: Fix three simple typos in comments lguest: update comments lguest: Simplify device initialization. lguest: don't rewrite vmcall instructions lguest: remove remaining vmcall lguest: use a special 1:1 linear pagetable mode until first switch. lguest: Do not exit on non-fatal errors	2011-07-22 13:45:50 -07:00
Jonas Bonn	a4e05276a1	asm-generic: move archictures to common delay.h This patch moves the in-tree architectures that were using the 'generic' delay.h over to using the header file in asm-generic. This is not done using the generic-y mechanism as none of these arch's have started using that mechanism yet. This is a trivial change to make later when the arch begins using generic-y. Note the subtle change to the avr32 and SH architectures where the argument to __const_udelay was previously using the rounded down constant value instead of the rounded up value. Signed-off-by: Jonas Bonn <jonas@southpole.se> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>	2011-07-22 18:46:24 +02:00
Rusty Russell	9f54288def	lguest: update comments Also removes a long-unused #define and an extraneous semicolon. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2011-07-22 14:39:50 +09:30
H. Peter Anvin	ae7bd11b47	clocksource: Change __ARCH_HAS_CLOCKSOURCE_DATA to a CONFIG option The machinery for __ARCH_HAS_CLOCKSOURCE_DATA assumed a file in asm-generic would be the default for architectures without their own file in asm/, but that is not how it works. Replace it with a Kconfig option instead. Link: http://lkml.kernel.org/r/4E288AA6.7090804@zytor.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@mit.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Tony Luck <tony.luck@intel.com>	2011-07-21 13:34:05 -07:00
Robert Richter	1ac2e6ca44	x86, perf: Make copy_from_user_nmi() a library function copy_from_user_nmi() is used in oprofile and perf. Moving it to other library functions like copy_from_user(). As this is x86 code for 32 and 64 bits, create a new file usercopy.c for unified code. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110607172413.GJ20052@erda.amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-21 20:41:57 +02:00
Cyrill Gorcunov	f53173e47d	x86, perf: P4 PMU - Fix typos in comments and style cleanup This patch: - fixes typos in comments and clarifies the text - renames obscure p4_event_alias::original and ::alter members to ::original and ::alternative as appropriate - drops parenthesis from the return of p4_get_alias_event() No functional changes. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Link: http://lkml.kernel.org/r/20110721160625.GX7492@sun Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-21 20:41:54 +02:00
Jan Beulich	ac619f4eba	x86: Serialize SMP bootup CMOS accesses on rtc_lock With CPU hotplug, there is a theoretical race between other CMOS (namely RTC) accesses and those done in the SMP secondary processor bringup path. I am unware of the problem having been noticed by anyone in practice, but it would very likely be rather spurious and very hard to reproduce. So to be on the safe side, acquire rtc_lock around those accesses. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/4E257AE7020000780004E2FF@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-21 09:20:59 +02:00
Jan Beulich	a750036f35	x86: Fix write lock scalability 64-bit issue With the write lock path simply subtracting RW_LOCK_BIAS there is, on large systems, the theoretical possibility of overflowing the 32-bit value that was used so far (namely if 128 or more CPUs manage to do the subtraction, but don't get to do the inverse addition in the failure path quickly enough). A first measure is to modify RW_LOCK_BIAS itself - with the new value chosen, it is good for up to 2048 CPUs each allowed to nest over 2048 times on the read path without causing an issue. Quite possibly it would even be sufficient to adjust the bias a little further, assuming that allowing for significantly less nesting would suffice. However, as the original value chosen allowed for even more nesting levels, to support more than 2048 CPUs (possible currently only for 64-bit kernels) the lock itself gets widened to 64 bits. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4E258E0D020000780004E3F0@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-21 09:03:36 +02:00
Jan Beulich	4625cd6379	x86: Unify rwlock assembly implementation Rather than having two functionally identical implementations for 32- and 64-bit configurations, extend the existing assembly abstractions enough to fold the two rwlock implementations into a shared one. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4E258DD7020000780004E3EA@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-21 09:03:31 +02:00
Jeremy Fitzhardinge	c796f213a6	xen/trace: add multicall tracing Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-07-18 15:43:26 -07:00
Andy Lutomirski	98d0ac38ca	x86-64: Move vread_tsc and vread_hpet into the vDSO The vsyscall page now consists entirely of trap instructions. Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/637648f303f2ef93af93bae25186e9a1bea093f5.1310639973.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 17:57:05 -07:00
H. Peter Anvin	4bb82178f5	x86, msr: Fix typo in ENERGY_PERF_BIAS_POWERSAVE Fix a trivial typo in the name of the constant ENERGY_PERF_BIAS_POWERSAVE. This didn't cause trouble because this constant is not currently used for anything. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Len Brown <len.brown@intel.com> Link: http://lkml.kernel.org/r/tip-abe48b108247e9b90b4c6739662a2e5c765ed114@git.kernel.org	2011-07-14 14:58:44 -07:00
Cyrill Gorcunov	f912987097	perf, x86: P4 PMU - Introduce event alias feature Instead of hw_nmi_watchdog_set_attr() weak function and appropriate x86_pmu::hw_watchdog_set_attr() call we introduce even alias mechanism which allow us to drop this routines completely and isolate quirks of Netburst architecture inside P4 PMU code only. The main idea remains the same though -- to allow nmi-watchdog and perf top run simultaneously. Note the aliasing mechanism applies to generic PERF_COUNT_HW_CPU_CYCLES event only because arbitrary event (say passed as RAW initially) might have some additional bits set inside ESCR register changing the behaviour of event and we can't guarantee anymore that alias event will give the same result. P.S. Thanks a huge to Don and Steven for for testing and early review. Acked-by: Don Zickus <dzickus@redhat.com> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> CC: Ingo Molnar <mingo@elte.hu> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Stephane Eranian <eranian@google.com> CC: Lin Ming <ming.m.lin@intel.com> CC: Arnaldo Carvalho de Melo <acme@redhat.com> CC: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20110708201712.GS23657@sun Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-07-14 17:25:04 -04:00
Len Brown	abe48b1082	x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS Since 2.6.36 (`23016bf0d2`), Linux prints the existence of "epb" in /proc/cpuinfo, Since 2.6.38 (`d5532ee7b4`), the x86_energy_perf_policy(8) utility has been available in-tree to update MSR_IA32_ENERGY_PERF_BIAS. However, the typical BIOS fails to initialize the MSR, presumably because this is handled by high-volume shrink-wrap operating systems... Linux distros, on the other hand, do not yet invoke x86_energy_perf_policy(8). As a result, WSM-EP, SNB, and later hardware from Intel will run in its default hardware power-on state (performance), which assumes that users care for performance at all costs and not for energy efficiency. While that is fine for performance benchmarks, the hardware's intended default operating point is "normal" mode... Initialize the MSR to the "normal" by default during kernel boot. x86_energy_perf_policy(8) is available to change the default after boot, should the user have a different preference. Signed-off-by: Len Brown <len.brown@intel.com> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1107140051020.18606@x980 Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@kernel.org>	2011-07-14 12:13:42 -07:00
Tejun Heo	24aa07882b	memblock, x86: Replace memblock_x86_reserve/free_range() with generic ones Other than sanity check and debug message, the x86 specific version of memblock reserve/free functions are simple wrappers around the generic versions - memblock_reserve/free(). This patch adds debug messages with caller identification to the generic versions and replaces x86 specific ones and kills them. arch/x86/include/asm/memblock.h and arch/x86/mm/memblock.c are empty after this change and removed. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310462166-31469-14-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:47:53 -07:00
Tejun Heo	c378ddd53f	memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option From 6839454ae63f1eb21e515c10229ca95c22955fec Mon Sep 17 00:00:00 2001 From: Tejun Heo <tj@kernel.org> Date: Thu, 14 Jul 2011 11:22:17 +0200 Make ARCH_DISCARD_MEMBLOCK a config option so that it can be handled together with other MEMBLOCK options. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20110714094603.GH3455@htj.dyndns.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:47:52 -07:00
Tejun Heo	474b881bf4	x86: Use absent_pages_in_range() instead of memblock_x86_hole_size() memblock_x86_hole_size() calculates the total size of holes in a given range according to memblock and is used by numa emulation code and numa_meminfo_cover_memory(). Since conversion to MEMBLOCK_NODE_MAP, absent_pages_in_range() also uses memblock and gives the same result. This patch replaces memblock_x86_hole_size() uses with absent_pages_in_range(). After the conversion the x86 function doesn't have any user left and is killed. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310462166-31469-12-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:47:51 -07:00
Tejun Heo	6b5d41a1b9	memblock, x86: Reimplement memblock_find_dma_reserve() using iterators memblock_find_dma_reserve() wants to find out how much memory is reserved under MAX_DMA_PFN. memblock_x86_memory_[free_]in_range() are used to find out the amounts of all available and free memory in the area, which are then subtracted to find out the amount of reservation. memblock_x86_memblock_[free_]in_range() are implemented using __memblock_x86_memory_in_range() which builds ranges from memblock and then count them, which is rather unnecessarily complex. This patch open codes the counting logic directly in memblock_find_dma_reserve() using memblock iterators and removes now unused __memblock_x86_memory_in_range() and find_range_array(). Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310462166-31469-11-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:47:50 -07:00
Tejun Heo	8a9ca34c11	memblock, x86: Replace __get_free_all_memory_range() with for_each_free_mem_range() __get_free_all_memory_range() walks memblock, calculates free memory areas and fills in the specified range. It can be easily replaced with for_each_free_mem_range(). Convert free_low_memory_core_early() and add_highpages_with_active_regions() to for_each_free_mem_range(). This leaves __get_free_all_memory_range() without any user. Kill it and related functions. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310462166-31469-10-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:47:49 -07:00
Tejun Heo	64a02daacb	memblock, x86: Make free_all_memory_core_early() explicitly free lowmem only nomemblock is currently used only by x86 and on x86_32 free_all_memory_core_early() silently freed only the low mem because get_free_all_memory_range() in arch/x86/mm/memblock.c implicitly limited range to max_low_pfn. Rename free_all_memory_core_early() to free_low_memory_core_early() and make it call __get_free_all_memory_range() and limit the range to max_low_pfn explicitly. This makes things clearer and also is consistent with the bootmem behavior. This leaves get_free_all_memory_range() without any user. Kill it. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310462166-31469-9-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:47:49 -07:00
Tejun Heo	8d89ac8084	x86: Replace memblock_x86_find_in_range_size() with for_each_free_mem_range() setup_bios_corruption_check() and memtest do_one_pass() open code memblock free area iteration using memblock_x86_find_in_range_size(). Convert them to use for_each_free_mem_range() instead. This leaves memblock_x86_find_in_range_size() and memblock_x86_check_reserved_size() unused. Kill them. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310462166-31469-8-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:47:48 -07:00
Tejun Heo	ab5d140b9e	x86: Use __memblock_alloc_base() in early_reserve_e820() early_reserve_e820() implements its own ad-hoc early allocator using memblock_x86_find_in_range_size(). Use __memblock_alloc_base() instead and remove the unnecessary @startt parameter (it's top-down allocation anyway). Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310462166-31469-6-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:47:47 -07:00
Tejun Heo	0608f70c78	x86: Use HAVE_MEMBLOCK_NODE_MAP From 5732e1247898d67cbf837585150fe9f68974671d Mon Sep 17 00:00:00 2001 From: Tejun Heo <tj@kernel.org> Date: Thu, 14 Jul 2011 11:22:16 +0200 Convert x86 to HAVE_MEMBLOCK_NODE_MAP. The only difference in memory handling is that allocations can't no longer cross node boundaries whether they're node affine or not, which shouldn't matter at all. This conversion will enable further simplification of boot memory handling. -v2: Fix build failure on !NUMA configurations discovered by hpa. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20110714094423.GG3455@htj.dyndns.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:47:43 -07:00
Tejun Heo	eb40c4c27f	memblock, x86: Replace memblock_x86_find_in_range_node() with generic memblock calls With the previous changes, generic NUMA aware memblock API has feature parity with memblock_x86_find_in_range_node(). There currently are two users - x86 setup_node_data() and __alloc_memory_core_early() in nobootmem.c. This patch converts the former to use memblock_alloc_nid() and the latter memblock_find_range_in_node(), and kills memblock_x86_find_in_range_node() and related functions including find_memory_early_core_early() in page_alloc.c. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310460395-30913-9-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:45:35 -07:00
Glauber Costa	3c404b578f	KVM guest: Add a pv_ops stub for steal time This patch adds a function pointer in one of the many paravirt_ops structs, to allow guests to register a steal time function. Besides a steal time function, we also declare two jump_labels. They will be used to allow the steal time code to be easily bypassed when not in use. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-14 12:59:44 +03:00
Glauber Costa	c9aaa8957f	KVM: Steal time implementation To implement steal time, we need the hypervisor to pass the guest information about how much time was spent running other processes outside the VM, while the vcpu had meaningful work to do - halt time does not count. This information is acquired through the run_delay field of delayacct/schedstats infrastructure, that counts time spent in a runqueue but not running. Steal time is a per-cpu information, so the traditional MSR-based infrastructure is used. A new msr, KVM_MSR_STEAL_TIME, holds the memory area address containing information about steal time This patch contains the hypervisor part of the steal time infrasructure, and can be backported independently of the guest portion. [avi, yongjie: export delayacct_on, to avoid build failures in some configs] Signed-off-by: Glauber Costa <glommer@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Rik van Riel <riel@redhat.com> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Yongjie Ren <yongjie.ren@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-14 12:59:14 +03:00
Andy Lutomirski	433bd805e5	clocksource: Replace vread with generic arch data The vread field was bloating struct clocksource everywhere except x86_64, and I want to change the way this works on x86_64, so let's split it out into per-arch data. Cc: x86@kernel.org Cc: Clemens Ladisch <clemens@ladisch.de> Cc: linux-ia64@vger.kernel.org Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/3ae5ec76a168eaaae63f08a2a1060b91aa0b7759.1310563276.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-13 11:23:12 -07:00
Andy Lutomirski	59e97e4d6f	x86: Make alternative instruction pointers relative This save a few bytes on x86-64 and means that future patches can apply alternatives to unrelocated code. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/ff64a6b9a1a3860ca4a7b8b6dc7b4754f9491cd7.1310563276.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-13 11:22:56 -07:00
Andy Lutomirski	c9712944b2	x86-64: Improve vsyscall emulation CS and RIP handling Three fixes here: - Send SIGSEGV if called from compat code or with a funny CS. - Don't BUG on impossible addresses. - Add a missing local_irq_disable. This patch also removes an unused variable. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/6fb2b13ab39b743d1e4f466eef13425854912f7f.1310563276.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-13 11:22:55 -07:00
Tejun Heo	d0ead15738	x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ DISCONTIGMEM on x86-32 implements pfn -> nid mapping similarly to SPARSEMEM; however, it calls each mapping unit ELEMENT instead of SECTION. This patch renames it to SECTION so that PAGES_PER_SECTION is valid for both DISCONTIGMEM and SPARSEMEM. This will be used by the next patch to implement mapping granularity check. This patch is trivial constant rename. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20110712074422.GA2872@htj.dyndns.org Cc: Hans Rosenfeld <hans.rosenfeld@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-12 21:58:11 -07:00
Christoph Lameter	688d3be815	percpu: Fixup __this_cpu_xchg* operations Somehow we got into a situation where the __this_cpu_xchg() operations were not defined in the same way as this_cpu_xchg() and friends. I had some build failures under 32 bit that were addressed by these fixes. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-07-12 13:47:16 +02:00
Glauber Costa	9ddabbe72e	KVM: KVM Steal time guest/host interface To implement steal time, we need the hypervisor to pass the guest information about how much time was spent running other processes outside the VM. This is per-vcpu, and using the kvmclock structure for that is an abuse we decided not to make. In this patchset, I am introducing a new msr, KVM_MSR_STEAL_TIME, that holds the memory area address containing information about steal time This patch contains the headers for it. I am keeping it separate to facilitate backports to people who wants to backport the kernel part but not the hypervisor, or the other way around. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:17:03 +03:00
Glauber Costa	4b6b35f55c	KVM: Add constant to represent KVM MSRs enabled bit in guest/host interface This patch is simple, put in a different commit so it can be more easily shared between guest and hypervisor. It just defines a named constant to indicate the enable bit for KVM-specific MSRs. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:17:02 +03:00
Avi Kivity	411c588dfb	KVM: MMU: Adjust shadow paging to work when SMEP=1 and CR0.WP=0 When CR0.WP=0, we sometimes map user pages as kernel pages (to allow the kernel to write to them). Unfortunately this also allows the kernel to fetch from these pages, even if CR4.SMEP is set. Adjust for this by also setting NX on the spte in these circumstances. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:26 +03:00
Yang, Wei	d9c3476d8a	KVM: Remove RDWRGSFS bit from CR4_RESERVED_BITS This patch removes RDWRGSFS bit from CR4_RESERVED_BITS. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:22 +03:00
Yang, Wei Y	8d9c975fc5	KVM: Remove SMEP bit from CR4_RESERVED_BITS This patch removes SMEP bit from CR4_RESERVED_BITS. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Shan, Haitao <haitao.shan@intel.com> Signed-off-by: Li, Xin <xin.li@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:12 +03:00
Avi Kivity	9dac77fa40	KVM: x86 emulator: fold decode_cache into x86_emulate_ctxt This saves a lot of pointless casts x86_emulate_ctxt and decode_cache. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:09 +03:00
Avi Kivity	36dd9bb5ce	KVM: x86 emulator: rename decode_cache::eip to _eip The name eip conflicts with a field of the same name in x86_emulate_ctxt, which we plan to fold decode_cache into. The name _eip is unfortunate, but what's really needed is a refactoring here, not a better name. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:09 +03:00
Nadav Har'El	7c1779384a	KVM: nVMX: vmcs12 checks on nested entry This patch adds a bunch of tests of the validity of the vmcs12 fields, according to what the VMX spec and our implementation allows. If fields we cannot (or don't want to) honor are discovered, an entry failure is emulated. According to the spec, there are two types of entry failures: If the problem was in vmcs12's host state or control fields, the VMLAUNCH instruction simply fails. But a problem is found in the guest state, the behavior is more similar to that of an exit. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:16 +03:00
Nadav Har'El	4704d0befb	KVM: nVMX: Exiting from L2 to L1 This patch implements nested_vmx_vmexit(), called when the nested L2 guest exits and we want to run its L1 parent and let it handle this exit. Note that this will not necessarily be called on every L2 exit. L0 may decide to handle a particular exit on its own, without L1's involvement; In that case, L0 will handle the exit, and resume running L2, without running L1 and without calling nested_vmx_vmexit(). The logic for deciding whether to handle a particular exit in L1 or in L0, i.e., whether to call nested_vmx_vmexit(), will appear in a separate patch below. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:16 +03:00
Nadav Har'El	0140caea3b	KVM: nVMX: Success/failure of VMX instructions. VMX instructions specify success or failure by setting certain RFLAGS bits. This patch contains common functions to do this, and they will be used in the following patches which emulate the various VMX instructions. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:12 +03:00
Nadav Har'El	b87a51ae28	KVM: nVMX: Implement reading and writing of VMX MSRs When the guest can use VMX instructions (when the "nested" module option is on), it should also be able to read and write VMX MSRs, e.g., to query about VMX capabilities. This patch adds this support. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:11 +03:00
Nadav Har'El	5e1746d620	KVM: nVMX: Allow setting the VMXE bit in CR4 This patch allows the guest to enable the VMXE bit in CR4, which is a prerequisite to running VMXON. Whether to allow setting the VMXE bit now depends on the architecture (svm or vmx), so its checking has moved to kvm_x86_ops->set_cr4(). This function now returns an int: If kvm_x86_ops->set_cr4() returns 1, __kvm_set_cr4() will also return 1, and this will cause kvm_set_cr4() will throw a #GP. Turning on the VMXE bit is allowed only when the nested VMX feature is enabled, and turning it off is forbidden after a vmxon. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:10 +03:00
Takuya Yoshikawa	b5c9ff731f	KVM: x86 emulator: Avoid clearing the whole decode_cache During tracing the emulator, we noticed that init_emulate_ctxt() sometimes took a bit longer time than we expected. This patch is for mitigating the problem by some degree. By looking into the function, we soon notice that it clears the whole decode_cache whose size is about 2.5K bytes now. Furthermore, most of the bytes are taken for the two read_cache arrays, which are used only by a few instructions. Considering the fact that we are not assuming the cache arrays have been cleared when we store actual data, we do not need to clear the arrays: 2K bytes elimination. In addition, we can avoid clearing the fetch_cache and regs arrays. This patch changes the initialization not to clear the arrays. On our 64-bit host, init_emulate_ctxt() becomes 0.3 to 0.5us faster with this patch applied. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 11:45:09 +03:00
Xiao Guangrong	67052b3508	KVM: MMU: remove the arithmetic of parent pte rmap Parent pte rmap and page rmap are very similar, so use the same arithmetic for them Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:07 +03:00
Xiao Guangrong	53c07b1878	KVM: MMU: abstract the operation of rmap Abstract the operation of rmap to spte_list, then we can use it for the reverse mapping of parent pte in the later patch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:06 +03:00
Xiao Guangrong	332b207d65	KVM: MMU: optimize pte write path if don't have protected sp Simply return from kvm_mmu_pte_write path if no shadow page is write-protected, then we can avoid to walk all shadow pages and hold mmu-lock Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:02 +03:00
Avi Kivity	5e520e6278	KVM: VMX: Move VMREAD cleanup to exception handler We clean up a failed VMREAD by clearing the output register. Do it in the exception handler instead of unconditionally. This is worthwhile since there are more than a hundred call sites. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:00 +03:00
Takuya Yoshikawa	7b105ca290	KVM: x86 emulator: Stop passing ctxt->ops as arg of emul functions Dereference it in the actual users. This not only cleans up the emulator but also makes it easy to convert the old emulation functions to the new em_xxx() form later. Note: Remove some inline keywords to let the compiler decide inlining. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:44:59 +03:00
Konrad Rzeszutek Wilk	a0ee056709	xen/pci: Squash pci_xen_initial_domain and xen_setup_pirqs together. Since they are only called once and the rest of the pci_xen_* functions follow the same pattern of setup. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-07-11 13:19:30 -04:00
Steven Rostedt	e08fbb78f0	tracing, x86/irq: Do not trace arch_local_{,irq_}() functions I triggered a triple fault with gcc 4.5.1 because it did not honor the inline annotation to arch_local_save_flags() function and that function was added to the pool of functions traced by the function tracer. When preempt_schedule() called arch_local_save_flags() (called by irqs_disabled()), it was traced, but the first thing the function tracer does is disable preemption. When it enables preemption, the NEED_RESCHED flag will not have been cleared and the preemption check will trigger the call to preempt_schedule() again. Although the dynamic function tracer crashed immediately, the static version of the function tracer (CONFIG_DYNAMIC_FTRACE is not set) actually was able to show where the problem was. swapper-1 3.N.. 103885us : arch_local_save_flags <-preempt_schedule swapper-1 3.N.. 103886us : arch_local_save_flags <-preempt_schedule swapper-1 3.N.. 103886us : arch_local_save_flags <-preempt_schedule swapper-1 3.N.. 103887us : arch_local_save_flags <-preempt_schedule swapper-1 3.N.. 103887us : arch_local_save_flags <-preempt_schedule swapper-1 3.N.. 103888us : arch_local_save_flags <-preempt_schedule swapper-1 3.N.. 103888us : arch_local_save_flags <-preempt_schedule It went on for a while before it triple faulted with a corrupted stack. The arch_local_save_flags and arch_local_irq_* functions should not be traced. Even though they are marked as inline, gcc may still make them a function and enable tracing of them. The simple solution is to just mark them as notrace. I had to add the <linux/types.h> for this file to include the notrace tag. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20110702033852.733414762@goodmis.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-07 19:22:32 +02:00
Ingo Molnar	b395fb36d5	Merge branch 'iommu-3.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu into core/iommu	2011-07-07 12:58:28 +02:00
Daniel Drake	7bc74b3df7	x86, olpc-xo1-sci: Add GPE handler and ebook switch functionality The EC in the OLPC XO-1 delivers GPE events to provide various notifications. Add the basic code for GPE/EC event processing and enable the ebook switch, which can be used as a wakeup source. Signed-off-by: Daniel Drake <dsd@laptop.org> Link: http://lkml.kernel.org/r/1309019658-1712-8-git-send-email-dsd@laptop.org Acked-by: Andres Salomon <dilinger@queued.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-06 14:44:38 -07:00
Daniel Drake	bc4ecd5a5e	x86, olpc: EC SCI wakeup mask functionality Update the EC SCI masks with recent additions. Add functions to query SCI events and set the wakeup mask, to be used by followup patches. Add functions to tweak an event mask used to select certain EC events as a system wakeup source. Also add a function to determine if EC wakeup functionality is available, as this depends on child drivers (different for each laptop model) to configure the SCI interrupt. Signed-off-by: Daniel Drake <dsd@laptop.org> Link: http://lkml.kernel.org/r/1309019658-1712-7-git-send-email-dsd@laptop.org Acked-by: Andres Salomon <dilinger@queued.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-06 14:44:36 -07:00
Daniel Drake	97c4cb71c1	x86, olpc: Add XO-1 suspend/resume support Add code needed for basic suspend/resume of the XO-1 laptop. Based on earlier work by Jordan Crouse, Andres Salomon, and others. This patch incorporates all earlier feedback from Thomas Gleixner. To clarify a certain point (now more obvious in the code itself): On resume, OpenFirmware returns execution to Linux in protected mode with a kernel-compatible GDT already set up. The changes and simplifications suggested have all been included. Signed-off-by: Daniel Drake <dsd@laptop.org> Link: http://lkml.kernel.org/r/1309019658-1712-5-git-send-email-dsd@laptop.org Acked-by: Andres Salomon <dilinger@queued.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-06 14:44:32 -07:00
Ingo Molnar	729aa21ab8	Merge branch 'perf/stacktrace' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core	2011-07-03 20:39:40 +02:00
Frederic Weisbecker	9e46294dad	x86: Save stack pointer in perf live regs savings In order to prepare for fetching the stack pointer from the regs when possible in dump_trace() instead of taking the local one, save the current stack pointer in perf live regs saving. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>	2011-07-02 18:04:03 +02:00
Tejun Heo	a26474e864	x86-32, NUMA: Fix boot regression caused by NUMA init unification on highmem machines During 32/64 NUMA init unification, commit `797390d855` ("x86-32, NUMA: use sparse_memory_present_with_active_regions()") made 32bit mm init call memory_present() automatically from active_regions instead of leaving it to each NUMA init path. This commit description is inaccurate - memory_present() calls aren't the same for flat and numaq. After the commit, memory_present() is only called for the intersection of e820 and NUMA layout. Before, on flatmem, memory_present() would be called from 0 to max_pfn. After, it would be called only on the areas that e820 indicates to be populated. This is how x86_64 works and should be okay as memmap is allowed to contain holes; however, x86_32 DISCONTIGMEM is missing early_pfn_valid(), which makes memmap_init_zone() assume that memmap doesn't contain any hole. This leads to the following oops if e820 map contains holes as it often does on machine with near or more 4GiB of memory by calling pfn_to_page() on a pfn which isn't mapped to a NUMA node, a reported by Conny Seidel: BUG: unable to handle kernel paging request at 000012b0 IP: [<c1aa13ce>] memmap_init_zone+0x6c/0xf2 pdpt =3D 0000000000000000 pde =3D f000eef3f000ee00 Oops: 0000 [#1] SMP last sysfs file: Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00164-g797390d #1 To Be Filled By O.E.M. To Be Filled By O.E.M./E350M1 EIP: 0060:[<c1aa13ce>] EFLAGS: 00010012 CPU: 0 EIP is at memmap_init_zone+0x6c/0xf2 EAX: 00000000 EBX: 000a8000 ECX: 000a7fff EDX: f2c00b80 ESI: 000a8000 EDI: f2c00800 EBP: c19ffe54 ESP: c19ffe34 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Process swapper (pid: 0, ti=3Dc19fe000 task=3Dc1a07f60 task.ti=3Dc19fe000) Stack: 00000002 00000000 0023f000 00000000 10000000 00000a00 f2c00000 f2c00b58 c19ffeb0 c1a80f24 000375fe 00000000 f2c00800 00000800 00000100 00000030 c1abb768 0000003c 00000000 00000000 00000004 00207a02 f2c00800 000375fe Call Trace: [<c1a80f24>] free_area_init_node+0x358/0x385 [<c1a81384>] free_area_init_nodes+0x420/0x487 [<c1a79326>] paging_init+0x114/0x11b [<c1a6cb13>] setup_arch+0xb37/0xc0a [<c1a69554>] start_kernel+0x76/0x316 [<c1a690a8>] i386_start_kernel+0xa8/0xb0 This patch fixes the bug by defining early_pfn_valid() to be the same as pfn_valid() when DISCONTIGMEM. Reported-bisected-and-tested-by: Conny Seidel <conny.seidel@amd.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: hans.rosenfeld@amd.com Cc: Christoph Lameter <cl@linux.com> Cc: Conny Seidel <conny.seidel@amd.com> Link: http://lkml.kernel.org/r/20110628094107.GB3386@htj.dyndns.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-01 13:38:51 +02:00
Jesper Juhl	e376fd664b	watchdog: Intel SCU Watchdog: Fix build and remove duplicate code Trying to build the Intel SCU Watchdog fails for me with gcc 4.6.0 - $ gcc --version \| head -n 1 gcc (GCC) 4.6.0 20110513 (prerelease) like this : CC drivers/watchdog/intel_scu_watchdog.o In file included from drivers/watchdog/intel_scu_watchdog.c:49:0: /home/jj/src/linux-2.6/arch/x86/include/asm/apb_timer.h: In function ‘apbt_time_init’: /home/jj/src/linux-2.6/arch/x86/include/asm/apb_timer.h:65:42: warning: ‘return’ with a value, in function returning void [enabled by default] drivers/watchdog/intel_scu_watchdog.c: In function ‘intel_scu_watchdog_init’: drivers/watchdog/intel_scu_watchdog.c:468:2: error: implicit declaration of function ‘sfi_get_mtmr’ [-Werror=implicit-function-declaration] drivers/watchdog/intel_scu_watchdog.c:468:32: warning: assignment makes pointer from integer without a cast [enabled by default] cc1: some warnings being treated as errors make[1]: * [drivers/watchdog/intel_scu_watchdog.o] Error 1 make: * [drivers/watchdog/intel_scu_watchdog.o] Error 2 Additionally, linux/types.h is needlessly being included twice in drivers/watchdog/intel_scu_watchdog.c Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>	2011-06-28 07:42:50 +00:00
Jamie Iles	06c3df4952	clocksource: apb: Share APB timer code with other platforms The APB timers are an IP block from Synopsys (DesignWare APB timers) and are also found in other systems including ARM SoC's. This patch adds functions for creating clock_event_devices and clocksources from APB timers but does not do the resource allocation. This is handled in a higher layer to allow the timers to be created from multiple methods such as platform_devices. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Jamie Iles <jamie@jamieiles.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2011-06-27 15:16:21 -07:00
KAMEZAWA Hiroyuki	c6830c2260	Fix node_start/end_pfn() definition for mm/page_cgroup.c commit `21a3c96` uses node_start/end_pfn(nid) for detection start/end of nodes. But, it's not defined in linux/mmzone.h but defined in /arch/???/include/mmzone.h which is included only under CONFIG_NEED_MULTIPLE_NODES=y. Then, we see mm/page_cgroup.c: In function 'page_cgroup_init': mm/page_cgroup.c:308: error: implicit declaration of function 'node_start_pfn' mm/page_cgroup.c:309: error: implicit declaration of function 'node_end_pfn' So, fixiing page_cgroup.c is an idea... But node_start_pfn()/node_end_pfn() is a very generic macro and should be implemented in the same manner for all archs. (m32r has different implementation...) This patch removes definitions of node_start/end_pfn() in each archs and defines a unified one in linux/mmzone.h. It's not under CONFIG_NEED_MULTIPLE_NODES, now. A result of macro expansion is here (mm/page_cgroup.c) for !NUMA start_pfn = ((&contig_page_data)->node_start_pfn); end_pfn = ({ pg_data_t __pgdat = (&contig_page_data); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;}); for NUMA (x86-64) start_pfn = ((node_data[nid])->node_start_pfn); end_pfn = ({ pg_data_t __pgdat = (node_data[nid]); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;}); Changelog: - fixed to avoid using "nid" twice in node_end_pfn() macro. Reported-and-acked-by: Randy Dunlap <randy.dunlap@oracle.com> Reported-and-tested-by: Ingo Molnar <mingo@elte.hu> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-27 14:13:09 -07:00
Christoph Lameter	3824abd127	x86: Add support for cmpxchg_double A simple implementation that only supports the word size and does not have a fallback mode (would require a spinlock). Add 32 and 64 bit support for cmpxchg_double. cmpxchg double uses the cmpxchg8b or cmpxchg16b instruction on x86 processors to compare and swap 2 machine words. This allows lockless algorithms to move more context information through critical sections. Set a flag CONFIG_CMPXCHG_DOUBLE to signal that support for double word cmpxchg detection has been build into the kernel. Note that each subsystem using cmpxchg_double has to implement a fall back mechanism as long as we offer support for processors that do not implement cmpxchg_double. Reviewed-by: H. Peter Anvin <hpa@zytor.com> Cc: Tejun Heo <tj@kernel.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <cl@linux.com> Link: http://lkml.kernel.org/r/20110601172614.173427964@linux.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-06-25 12:17:32 -07:00
cpw@sgi.com	ae90c232be	x86, UV: Correct UV2 BAU destination timeout Correct the UV2 broacast assist unit's destination timeout period. And the activation status register in UV2 should be tested for a destination timeout with a 4, not a 2. The values for Active versus Timeout were reversed. This patch is critical for TLB shootdown on an Altix UV2 system (i.e. the follow-on to the current Altix UV). Destination timeout period: The period is set in 4 bits of memory-mapped register MISC_CONTROL. The left bit toggles base period between 10us and 80us. The other 3 bits are the multiplier. Decimal 15, hex f, gives the maximum: 7 * 80us Signed-off-by: Cliff Wickman <cpw@sgi.com> Link: http://lkml.kernel.org/r/20110621122243.117324443@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-06-21 14:50:34 +02:00
cpw@sgi.com	442d392492	x86, UV: Remove cpumask_t from the stack Remove the large stack-resident cpumask_t from reset_with_ipi()'s stack by allocating one per uvhub. Due to the limited size of the stack the potentially huge cpumask_t may cause stack overrun. We haven't seen it happen yet, but we need to make it a practice not to push such structures onto the stack. Signed-off-by: Cliff Wickman <cpw@sgi.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20110621122242.832589130@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-06-21 14:50:33 +02:00
cpw@sgi.com	a456eaab87	x86, UV: Rename hubmask to pnmask Rename 'bau_targ_hubmask' to 'pnmask' for clarity. The BAU distribution bit mask is indexed by pnode number, not hub or blade number. This important fact is not clear while the mask is called a 'hubmask'. Signed-off-by: Cliff Wickman <cpw@sgi.com> Link: http://lkml.kernel.org/r/20110621122242.630995969@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-06-21 14:50:32 +02:00
cpw@sgi.com	b18fb2c04a	x86, UV: Inline header file functions Make all the functions in uv_bau.h inline so that it can be included in the fake prom (used in simulations). If not inlined the unused functions will generate compiler warnings. Signed-off-by: Cliff Wickman <cpw@sgi.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/20110621122242.230529678@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-06-21 14:50:31 +02:00
Joerg Roedel	801019d59d	Merge branches 'amd/transparent-bridge' and 'core' Conflicts: arch/x86/include/asm/amd_iommu_types.h arch/x86/kernel/amd_iommu.c Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-06-21 11:14:10 +02:00
Joerg Roedel	403f81d8ee	iommu/amd: Move missing parts to drivers/iommu A few parts of the driver were missing in drivers/iommu. Move them there to have the complete driver in that directory. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-06-21 10:49:31 +02:00
Linus Torvalds	10e18e6230	Merge branch 'kvm-updates/3.0' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/3.0' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Fix register corruption in pvclock_scale_delta KVM: MMU: fix opposite condition in mapping_level_dirty_bitmap KVM: VMX: do not overwrite uptodate vcpu->arch.cr3 on KVM_SET_SREGS KVM: MMU: Fix build warnings in walk_addr_generic()	2011-06-20 08:58:07 -07:00
Zachary Amsden	de2d1a524e	KVM: Fix register corruption in pvclock_scale_delta The 128-bit multiply in pvclock.h was missing an output constraint for EDX which caused a register corruption to appear. Thanks to Ulrich for diagnosing the EDX corruption and Avi for providing this fix. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-06-19 19:23:14 +03:00
Maarten Lankhorst	7d68dc3f10	x86, efi: Do not reserve boot services regions within reserved areas Commit `916f676f8d` started reserving boot service code since some systems require you to keep that code around until SetVirtualAddressMap is called. However, in some cases those areas will overlap with reserved regions. The proper medium-term fix is to fix the bootloader to prevent the conflicts from occurring by moving the kernel to a better position, but the kernel should check for this possibility, and only reserve regions which can be reserved. Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com> Link: http://lkml.kernel.org/r/4DF7A005.1050407@gmail.com Acked-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-06-18 22:48:49 +02:00
Hidetoshi Seto	c7cece89f1	x86, mce: Use mce_sysdev_ prefix to group functions There are many functions named mce_* so use a new prefix for the subset of functions related to sysfs support. And since `f3c6ea1b06` introduces syscore_ops, use the prefix mce_syscore for some functions related to power management which were in sysdev_class before. Before: After: mce_device mce_sysdev mce_sysclass mce_sysdev_class mce_attrs mce_sysdev_attrs mce_dev_initialized mce_sysdev_initialized mce_create_device mce_sysdev_create mce_remove_device mce_sysdev_remove mce_suspend mce_syscore_suspend mce_shutdown mce_syscore_shutdown mce_resume mce_syscore_resume Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4DEED81B.8020506@jp.fujitsu.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-06-16 12:10:16 +02:00
Hidetoshi Seto	2b90e77eae	x86, mce: Replace MCM_ with MCI_MISC_ Follow other MCi register defines. Plus define MCI_MISC_ADDR_LSB() and MCI_MISC_ADDR_MODE(). Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4DEED6E8.9090509@jp.fujitsu.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-06-16 12:10:10 +02:00
Hidetoshi Seto	b77e70bf35	x86, mce: Replace MCE_SELF_VECTOR by irq_work The MCE handler uses a special vector for self IPI to invoke post-emergency processing in an interrupt context, e.g. call an NMI-unsafe function, wakeup loggers, schedule time-consuming work for recovery, etc. This mechanism is now generalized by the following commit: > `e360adbe29` > Author: Peter Zijlstra <a.p.zijlstra@chello.nl> > Date: Thu Oct 14 14:01:34 2010 +0800 > > irq_work: Add generic hardirq context callbacks > > Provide a mechanism that allows running code in IRQ context. It is > most useful for NMI code that needs to interact with the rest of the > system -- like wakeup a task to drain buffers. : So change to use provided generic mechanism. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4DEED6B2.6080005@jp.fujitsu.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-06-16 12:10:08 +02:00
Joerg Roedel	71f7758090	x86/amd-iommu: Store device alias as dev_data pointer This finally allows PCI-Device-IDs to be handled by the IOMMU driver that have no corresponding struct device present in the system. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-06-14 12:49:58 +02:00
Joerg Roedel	ec9e79ef06	x86/amd-iommu: Use only dev_data in low-level domain attach/detach functions With this patch the low-level attach/detach functions only work on dev_data structures. This allows to remove the dev_data->dev pointer. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-06-14 12:49:58 +02:00
Joerg Roedel	ea61cddb9d	x86/amd-iommu: Store ATS state in dev_data This allows the low-level functions to operate on dev_data exclusivly later. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-06-14 12:49:57 +02:00
Joerg Roedel	f62dda66b5	x86/amd-iommu: Store devid in dev_data This allows to use dev_data independent of struct device later. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-06-14 12:49:57 +02:00
Joerg Roedel	8fa5f802ab	x86/amd-iommu: Introduce global dev_data_list This list keeps all allocated iommu_dev_data structs in a list together. This is needed for instances that have no associated device. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-06-14 12:49:57 +02:00
Ralf Baechle	850492760c	i8253: Move remaining content and delete asm/i8253.h Move setup_pit_timer() declaration to the common header file and remove the arch specific ones. [ tglx: Move it to linux/i8253.h instead of asm/mips and asm/x86 ] Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: linux-arm-kernel@lists.infradead.org Cc: Russell King <linux@arm.linux.org.uk> Cc: linux-mips@linux-mips.org Cc: Sergei Shtylyov <sshtylyov@mvista.com Link: http://lkml.kernel.org/r/20110601180610.913463093@duck.linux-mips.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-06-09 15:01:40 +02:00
Ralf Baechle	49cf3f29a1	i8253: Consolidate definitions of PIT_LATCH x86 defines PIT_LATCH as LATCH which in <linux/timex.h> is defined as ((CLOCK_TICK_RATE + HZ/2) / HZ) and <asm/timex.h> again defines CLOCK_TICK_RATE as PIT_TICK_RATE. MIPS defines PIT_LATCH as LATCH which in <linux/timex.h> is defined as ((CLOCK_TICK_RATE + HZ/2) / HZ) and <asm/timex.h> again defines CLOCK_TICK_RATE as 1193182. ARM defines PITCH_LATCH as ((PIT_TICK_RATE + HZ / 2) / HZ) - and that's the sanest thing and equivalent to above definitions so use that as the new definition in <linux/i8253.h>. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@linux-mips.org Link: http://lkml.kernel.org/r/20110601180610.832810002@duck.linux-mips.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-06-09 15:01:40 +02:00
Ralf Baechle	16f871bc30	x86: i8253: Consolidate definitions of global_clock_event There are multiple declarations of global_clock_event in header files specific to particular clock event implementations. Consolidate them in <asm/time.h> and make sure all users include that header. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Venkatesh Pallipadi (Venki) <venki@google.com> Link: http://lkml.kernel.org/r/20110601180610.762763451@duck.linux-mips.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-06-09 15:01:40 +02:00
Ralf Baechle	cb2455aa27	i8253: Unify all kernel declarations of i8253_lock Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@linux-mips.org Link: http://lkml.kernel.org/r/20110601180610.134151920@duck.linux-mips.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-06-09 15:01:38 +02:00
Benjamin Herrenschmidt	ef3b4f8cc2	pci/of: Consolidate pci_bus_to_OF_node() The generic code always get the device-node in the right place now so a single implementation will work for all archs Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Michal Simek <monstr@monstr.eu> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2011-06-08 09:08:57 +10:00
Benjamin Herrenschmidt	64099d981c	pci/of: Consolidate pci_device_to_OF_node() All archs do more or less the same thing now, move it into a single generic place. I chose pci.h rather than of_pci.h to avoid having to change all call-sites to include the later. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Michal Simek <monstr@monstr.eu> Acked-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2011-06-08 09:08:43 +10:00
Andy Lutomirski	5cec93c216	x86-64: Emulate legacy vsyscalls There's a fair amount of code in the vsyscall page. It contains a syscall instruction (in the gettimeofday fallback) and who knows what will happen if an exploit jumps into the middle of some other code. Reduce the risk by replacing the vsyscalls with short magic incantations that cause the kernel to emulate the real vsyscalls. These incantations are useless if entered in the middle. This causes vsyscalls to be a little more expensive than real syscalls. Fortunately sensible programs don't use them. The only exception is time() which is still called by glibc through the vsyscall - but calling time() millions of times per second is not sensible. glibc has this fixed in the development tree. This patch is not perfect: the vread_tsc and vread_hpet functions are still at a fixed address. Fixing that might involve making alternative patching work in the vDSO. Signed-off-by: Andy Lutomirski <luto@mit.edu> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jesper Juhl <jj@chaosbits.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: Mikael Pettersson <mikpe@it.uu.se> Cc: Andi Kleen <andi@firstfloor.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Louis Rilling <Louis.Rilling@kerlabs.com> Cc: Valdis.Kletnieks@vt.edu Cc: pageexec@freemail.hu Link: http://lkml.kernel.org/r/e64e1b3c64858820d12c48fa739efbd1485e79d5.1307292171.git.luto@mit.edu [ Removed the CONFIG option - it's simpler to just do it unconditionally. Tidied up the code as well. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-06-07 10:02:35 +02:00
Andy Lutomirski	d319bb79af	x86-64: Map the HPET NX Currently the HPET mapping is a user-accessible syscall instruction at a fixed address some of the time. A sufficiently determined hacker might be able to guess when. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Jesper Juhl <jj@chaosbits.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: Mikael Pettersson <mikpe@it.uu.se> Cc: Andi Kleen <andi@firstfloor.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Louis Rilling <Louis.Rilling@kerlabs.com> Cc: Valdis.Kletnieks@vt.edu Cc: pageexec@freemail.hu Link: http://lkml.kernel.org/r/ab41b525a4ca346b1ca1145d16fb8d181861a8aa.1307292171.git.luto@mit.edu Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-06-05 21:30:33 +02:00
Andy Lutomirski	0d7b8547fb	x86-64: Remove kernel.vsyscall64 sysctl It's unnecessary overhead in code that's supposed to be highly optimized. Removing it allows us to remove one of the two syscall instructions in the vsyscall page. The only sensible use for it is for UML users, and it doesn't fully address inconsistent vsyscall results on UML. The real fix for UML is to stop using vsyscalls entirely. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Jesper Juhl <jj@chaosbits.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: Mikael Pettersson <mikpe@it.uu.se> Cc: Andi Kleen <andi@firstfloor.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Louis Rilling <Louis.Rilling@kerlabs.com> Cc: Valdis.Kletnieks@vt.edu Cc: pageexec@freemail.hu Link: http://lkml.kernel.org/r/973ae803fe76f712da4b2740e66dccf452d3b1e4.1307292171.git.luto@mit.edu Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-06-05 21:30:33 +02:00
Andy Lutomirski	9fd67b4ed0	x86-64: Give vvars their own page Move vvars out of the vsyscall page into their own page and mark it NX. Without this patch, an attacker who can force a daemon to call some fixed address could wait until the time contains, say, 0xCD80, and then execute the current time. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Jesper Juhl <jj@chaosbits.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: Mikael Pettersson <mikpe@it.uu.se> Cc: Andi Kleen <andi@firstfloor.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Louis Rilling <Louis.Rilling@kerlabs.com> Cc: Valdis.Kletnieks@vt.edu Cc: pageexec@freemail.hu Link: http://lkml.kernel.org/r/b1460f81dc4463d66ea3f2b5ce240f58d48effec.1307292171.git.luto@mit.edu Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-06-05 21:30:32 +02:00
Andy Lutomirski	6879eb2dee	x86-64: Fix alignment of jiffies variable It's declared __attribute__((aligned(16)) but it's explicitly not aligned. This is probably harmless but it's a bit embarrassing. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Jesper Juhl <jj@chaosbits.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: Mikael Pettersson <mikpe@it.uu.se> Cc: Andi Kleen <andi@firstfloor.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Louis Rilling <Louis.Rilling@kerlabs.com> Cc: Valdis.Kletnieks@vt.edu Cc: pageexec@freemail.hu Link: http://lkml.kernel.org/r/5f3bc5542e9aaa9382d53f153f54373165cdef89.1307292171.git.luto@mit.edu Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-06-05 21:30:31 +02:00
Borislav Petkov	dd2897bf0f	x86, asm: Fix binutils 2.16 issue with __USER32_CS While testing the patchset at http://lkml.kernel.org/r/1306873314-32523-1-git-send-email-bp@alien8.de with binutils 2.16.1 from hell, kernel build fails with the following error: arch/x86/ia32/ia32entry.S: Assembler messages: arch/x86/ia32/ia32entry.S:139: Error: too many positional arguments make[2]: * [arch/x86/ia32/ia32entry.o] Error 1 make[1]: * [arch/x86/ia32] Error 2 make[1]: * Waiting for unfinished jobs.... make: * [arch/x86] Error 2 make: *** Waiting for unfinished jobs.... due to spaces between the operators of the __USER32_CS define. Fix it so that gas 2.16 can swallow it too. Signed-off-by: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1307131642-32595-1-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-06-03 14:39:14 -07:00
Borislav Petkov	838feb4754	x86, asm: Flip RESTORE_ARGS arguments logic ... thus getting rid of the "else" part of the conditional statement in the macro. No functionality change. Signed-off-by: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1306873314-32523-4-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-06-03 14:38:53 -07:00
Borislav Petkov	cac0e0a78f	x86, asm: Flip SAVE_ARGS arguments logic This saves us the else part of the conditional statement in the macro. No functionality change. Signed-off-by: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1306873314-32523-3-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-06-03 14:38:51 -07:00
Borislav Petkov	a268fcfaa6	x86, asm: Thin down SAVE/RESTORE_* asm macros Use dwarf2 cfi annotation macros, making SAVE/RESTORE_* marginally more readable. No functionality change. Signed-off-by: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1306873314-32523-2-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-06-03 14:38:49 -07:00
Jack Steiner	55ba412028	x86, UV: Clean up uv_mmrs.h No code changes. Reformat definitions to make it more readable. I fixed alignment of comments in the structure definitions. Also aligned comments and most field definitions & values. Also sorted the defines for the SHIFT & MASK values for each MMR. This make the file visually much more acceptable. Some of the symbol names are still quite long. The file is based on post-processing of verilog definitions that are used for the node controller chip design. Although some symbol names are not what I would chose, I would like to maintain compatibility with the names used by the chip designers. We have a number of cross-reference utilities & having common names is important. Signed-off-by: Jack Steiner <steiner@sgi.com> Link: http://lkml.kernel.org/r/20110527145256.GA31224@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu> -- arch/x86/include/asm/uv/uv_mmrs.h \| 2873 +++++++++++++++++++++----------------- 1 file changed, 1600 insertions(+), 1273 deletions(-)	2011-05-30 11:08:48 +02:00
Linus Torvalds	f310642123	Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param x86 idle: deprecate "no-hlt" cmdline param x86 idle APM: deprecate CONFIG_APM_CPU_IDLE x86 idle floppy: deprecate disable_hlt() x86 idle: EXPORT_SYMBOL(default_idle, pm_idle) only when APM demands it x86 idle: clarify AMD erratum 400 workaround idle governor: Avoid lock acquisition to read pm_qos before entering idle cpuidle: menu: fixed wrapping timers at 4.294 seconds	2011-05-29 11:18:09 -07:00
Len Brown	02c68a0201	x86 idle: clarify AMD erratum 400 workaround The workaround for AMD erratum 400 uses the term "c1e" falsely suggesting: 1. Intel C1E is somehow involved 2. All AMD processors with C1E are involved Use the string "amd_c1e" instead of simply "c1e" to clarify that this workaround is specific to AMD's version of C1E. Use the string "e400" to clarify that the workaround is specific to AMD processors with Erratum 400. This patch is text-substitution only, with no functional change. cc: x86@kernel.org Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Len Brown <len.brown@intel.com>	2011-05-29 03:38:57 -04:00
Linus Torvalds	a947e23a8e	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, asm: Clean up desc.h a bit x86, amd: Do not enable ARAT feature on AMD processors below family 0x12 x86: Move do_page_fault()'s error path under unlikely() x86, efi: Retain boot service code until after switching to virtual mode x86: Remove unnecessary check in detect_ht() x86: Reorder mm_context_t to remove x86_64 alignment padding and thus shrink mm_struct x86, UV: Clean up uv_tlb.c x86, UV: Add support for SGI UV2 hub chip x86, cpufeature: Update CPU feature RDRND to RDRAND	2011-05-28 12:57:01 -07:00
Linus Torvalds	571503e100	Merge branch 'setns' * setns: ns: Wire up the setns system call Done as a merge to make it easier to fix up conflicts in arm due to addition of sendmmsg system call	2011-05-28 10:51:01 -07:00
Eric W. Biederman	7b21fddd08	ns: Wire up the setns system call 32bit and 64bit on x86 are tested and working. The rest I have looked at closely and I can't find any problems. setns is an easy system call to wire up. It just takes two ints so I don't expect any weird architecture porting problems. While doing this I have noticed that we have some architectures that are very slow to get new system calls. cris seems to be the slowest where the last system calls wired up were preadv and pwritev. avr32 is weird in that recvmmsg was wired up but never declared in unistd.h. frv is behind with perf_event_open being the last syscall wired up. On h8300 the last system call wired up was epoll_wait. On m32r the last system call wired up was fallocate. mn10300 has recvmmsg as the last system call wired up. The rest seem to at least have syncfs wired up which was new in the 2.6.39. v2: Most of the architecture support added by Daniel Lezcano <dlezcano@fr.ibm.com> v3: ported to v2.6.36-rc4 by: Eric W. Biederman <ebiederm@xmission.com> v4: Moved wiring up of the system call to another patch v5: ported to v2.6.39-rc6 v6: rebased onto parisc-next and net-next to avoid syscall conflicts. v7: ported to Linus's latest post 2.6.39 tree. > arch/blackfin/include/asm/unistd.h \| 3 ++- > arch/blackfin/mach-common/entry.S \| 1 + Acked-by: Mike Frysinger <vapier@gentoo.org> Oh - ia64 wiring looks good. Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-28 10:48:39 -07:00
Linus Torvalds	f23a5e1405	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM: Fix PM QOS's user mode interface to work with ASCII input PM / Hibernate: Update kerneldoc comments in hibernate.c PM / Hibernate: Remove arch_prepare_suspend() PM / Hibernate: Update some comments in core hibernate code	2011-05-27 14:27:34 -07:00
Ingo Molnar	9a3865b185	x86, asm: Clean up desc.h a bit I have looked at this file and found it rather ugly - improve readability a bit. No change in functionality. Link: http://lkml.kernel.org/n/tip-incpt6y26yd8586idx65t9ll@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-27 09:30:50 +02:00
Mike Frysinger	63ab25ebbc	kgdbts: unify/generalize gdb breakpoint adjustment The Blackfin arch, like the x86 arch, needs to adjust the PC manually after a breakpoint is hit as normally this is handled by the remote gdb. However, rather than starting another arch ifdef mess, create a common GDB_ADJUSTS_BREAK_OFFSET define for any arch to opt-in via their kgdb.h. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Dongdong Deng <dongdong.deng@windriver.com> Cc: Sergei Shtylyov <sshtylyov@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-26 17:12:36 -07:00
Mike Frysinger	c46dd6b48d	x86: convert to asm-generic ptrace.h Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Sergei Shtylyov <sshtylyov@mvista.com> Cc: Dongdong Deng <dongdong.deng@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-26 17:12:36 -07:00
Linus Torvalds	14587a2a25	Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: vdso: Remove unused variable x86-64: Optimize vDSO time() x86-64: Add time to vDSO x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO x86-64: Move vread_tsc into a new file with sensible options x86-64: Vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0 x86-64: Don't generate cmov in vread_tsc x86-64: Remove unnecessary barrier in vread_tsc x86-64: Clean up vdso/kernel shared variables	2011-05-26 12:19:31 -07:00
Linus Torvalds	f8d613e2a6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem: xen: cleancache shim to Xen Transcendent Memory ocfs2: add cleancache support ext4: add cleancache support btrfs: add cleancache support ext3: add cleancache support mm/fs: add hooks to support cleancache mm: cleancache core ops functions and config fs: add field to superblock to support cleancache mm/fs: cleancache documentation Fix up trivial conflict in fs/btrfs/extent_io.c due to includes	2011-05-26 10:50:56 -07:00
Dan Magenheimer	5bc20fc597	xen: cleancache shim to Xen Transcendent Memory This patch provides a shim between the kernel-internal cleancache API (see Documentation/mm/cleancache.txt) and the Xen Transcendent Memory ABI (see http://oss.oracle.com/projects/tmem). Xen tmem provides "hypervisor RAM" as an ephemeral page-oriented pseudo-RAM store for cleancache pages, shared cleancache pages, and frontswap pages. Tmem provides enterprise-quality concurrency, full save/restore and live migration support, compression and deduplication. A presentation showing up to 8% faster performance and up to 52% reduction in sectors read on a kernel compile workload, despite aggressive in-kernel page reclamation ("self-ballooning") can be found at: http://oss.oracle.com/projects/tmem/dist/documentation/presentations/TranscendentMemoryXenSummit2010.pdf Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Reviewed-by: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik Van Riel <riel@redhat.com> Cc: Jan Beulich <JBeulich@novell.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Andreas Dilger <adilger@sun.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Nitin Gupta <ngupta@vflare.org>	2011-05-26 10:02:21 -06:00
Ingo Molnar	de66ee979d	Merge branch 'linus' into x86/urgent Merge reason: we want to queue up a dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-26 13:51:35 +02:00
Roland Dreier	dbee8a0aff	x86: remove 32-bit versions of readq()/writeq() The presense of a writeq() implementation on 32-bit x86 that splits the 64-bit write into two 32-bit writes turns out to break the mpt2sas driver (and in general is risky for drivers as was discussed in <http://lkml.kernel.org/r/adaab6c1h7c.fsf@cisco.com>). To fix this, revert `2c5643b1c5` ("x86: provide readq()/writeq() on 32-bit too") and follow-on cleanups. This unfortunately leads to pushing non-atomic definitions of readq() and write() to various x86-only drivers that in the meantime started using the definitions in the x86 version of <asm/io.h>. However as discussed exhaustively, this is actually the right thing to do, because the right way to split a 64-bit transaction is hardware dependent and therefore belongs in the hardware driver (eg mpt2sas needs a spinlock to make sure no other accesses occur in between the two halves of the access). Build tested on 32- and 64-bit x86 allmodconfig. Link: http://lkml.kernel.org/r/x86-32-writeq-is-broken@mdm.bga.com Acked-by: Hitoshi Mitake <h.mitake@gmail.com> Cc: Kashyap Desai <Kashyap.Desai@lsi.com> Cc: Len Brown <lenb@kernel.org> Cc: Ravi Anand <ravi.anand@qlogic.com> Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Acked-by: James Bottomley <James.Bottomley@parallels.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:44 -07:00
Richard Kennedy	af6a25f0e1	x86: Reorder mm_context_t to remove x86_64 alignment padding and thus shrink mm_struct Reorder mm_context_t to remove alignment padding on 64 bit builds shrinking its size from 64 to 56 bytes. This allows mm_struct to shrink from 840 to 832 bytes, so using one fewer cache lines, and getting more objects per slab when using slub. slabinfo mm_struct reports before :- Sizes (bytes) Slabs ----------------------------------- Object : 840 Total : 7 SlabObj: 896 Full : 1 SlabSiz: 16384 Partial: 4 Loss : 56 CpuSlab: 2 Align : 64 Objects: 18 after :- Sizes (bytes) Slabs ---------------------------------- Object : 832 Total : 7 SlabObj: 832 Full : 1 SlabSiz: 16384 Partial: 4 Loss : 0 CpuSlab: 2 Align : 64 Objects: 19 Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Cc: wilsons@start.ca Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/1306244999.1999.5.camel@castor.rsk Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-25 16:16:41 +02:00
Cliff Wickman	f073cc8f39	x86, UV: Clean up uv_tlb.c SGI UV's uv_tlb.c driver has become rather hard to read, with overly large functions, non-standard coding style and (way) too long variable, constant and function names and non-obvious code flow sequences. This patch improves the readability and maintainability of the driver significantly, by doing the following strict code cleanups with no side effects: - Split long functions into shorter logical functions. - Shortened some variable and structure member names. - Added special functions for reads and writes of MMR regs with very long names. - Added the 'tunables' table to shortened tunables_write(). - Added the 'stat_description' table to shorten uv_ptc_proc_write(). - Pass fewer 'stat' arguments where it can be derived from the 'bcp' argument. - Function definitions consistent on one line, and inline in few (short) cases. - Moved some small structures and an atomic inline function to the header file. - Moved some local variables to the blocks where they are used. - Updated the copyright date. - Shortened uv_write_global_mmr64() etc. using some aliasing; no line breaks. Renamed many uv_.. functions that are not exported. - Aligned structure fields. [ note that not all structures are aligned the same way though; I'd like to keep the extensive commenting in some of them. ] - Shortened some long structure names. - Standard pass/fail exit from init_per_cpu() - Vertical alignment for mass initializations. - More separation between blocks of code. Tested on a 16-processor Altix UV. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: penberg@kernel.org Link: http://lkml.kernel.org/r/E1QOw12-0004MN-Lp@eag09.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-25 14:20:14 +02:00
Jack Steiner	2a919596c1	x86, UV: Add support for SGI UV2 hub chip This patch adds support for a new version of the SGI UV hub chip. The hub chip is the node controller that connects multiple blades into a larger coherent SSI. For the most part, UV2 is compatible with UV1. The majority of the changes are in the addresses of MMRs and in a few cases, the contents of MMRs. These changes are the result in changes in the system topology such as node configuration, processor types, maximum nodes, physical address sizes, etc. Signed-off-by: Jack Steiner <steiner@sgi.com> Link: http://lkml.kernel.org/r/20110511175028.GA18006@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-25 14:20:13 +02:00
Kees Cook	7ccafc5f75	x86, cpufeature: Update CPU feature RDRND to RDRAND The Intel manual changed the name of the CPUID bit to match the instruction name. We should follow suit for sanity's sake. (See Intel SDM Volume 2, Table 3-20 "Feature Information Returned in the ECX Register".) [ hpa: we can only do this at this time because there are currently no CPUs with this feature on the market, hence this is pre-hardware enabling. However, Cc:'ing stable so that stable can present a consistent ABI. ] Signed-off-by: Kees Cook <kees.cook@canonical.com> Link: http://lkml.kernel.org/r/20110524232926.GA27728@outflux.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: <stable@kernel.org> v2.6.36-39	2011-05-24 16:37:27 -07:00

1 2 3 4 5 ...

3014 Commits