linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-23 05:14:51 +07:00

Author	SHA1	Message	Date
Rafael J. Wysocki	cf579dfb82	PM / Sleep: Introduce "late suspend" and "early resume" of devices The current device suspend/resume phases during system-wide power transitions appear to be insufficient for some platforms that want to use the same callback routines for saving device states and related operations during runtime suspend/resume as well as during system suspend/resume. In principle, they could point their .suspend_noirq() and .resume_noirq() to the same callback routines as their .runtime_suspend() and .runtime_resume(), respectively, but at least some of them require device interrupts to be enabled while the code in those routines is running. It also makes sense to have device suspend-resume callbacks that will be executed with runtime PM disabled and with device interrupts enabled in case someone needs to run some special code in that context during system-wide power transitions. Apart from this, .suspend_noirq() and .resume_noirq() were introduced as a workaround for drivers using shared interrupts and failing to prevent their interrupt handlers from accessing suspended hardware. It appears to be better not to use them for other porposes, or we may have to deal with some serious confusion (which seems to be happening already). For the above reasons, introduce new device suspend/resume phases, "late suspend" and "early resume" (and analogously for hibernation) whose callback will be executed with runtime PM disabled and with device interrupts enabled and whose callback pointers generally may point to runtime suspend/resume routines. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Reviewed-by: Kevin Hilman <khilman@ti.com>	2012-01-29 20:38:29 +01:00
Dan Carpenter	d0caf29250	x86/dumpstack: Remove unneeded check in dump_trace() Smatch complains that we have some inconsistent NULL checking. If "task" were NULL then it would lead to a NULL dereference later. We can remove this test because earlier on in the function we have: if (!task) task = current; Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Clemens Ladisch <clemens@ladisch.de> Link: http://lkml.kernel.org/r/20120128105246.GA25092@elgon.mountain Signed-off-by: Ingo Molnar <mingo@elte.hu>	2012-01-28 13:09:06 +01:00
Ingo Molnar	44a6839711	Merge branch 'perf/fast' into perf/core Merge reason: Lets ready it for v3.4 Signed-off-by: Ingo Molnar <mingo@elte.hu>	2012-01-27 12:08:09 +01:00
Thomas Renninger	fad12ac8c8	CPU: Introduce ARCH_HAS_CPU_AUTOPROBE and X86 parts This patch is based on Andi Kleen's work: Implement autoprobing/loading of modules serving CPU specific features (x86cpu autoloading). And Kay Siever's work to get rid of sysdev cpu structures and making use of struct device instead. Before, the cpuid driver had to be loaded to get the x86cpu autoloading feature. With this patch autoloading works through the /sys/devices/system/cpu object Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Dave Jones <davej@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Huang Ying <ying.huang@intel.com> Cc: Len Brown <lenb@kernel.org> Acked-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Renninger <trenn@suse.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2012-01-26 16:49:08 -08:00
Andi Kleen	78ff123b05	x86: autoload microcode driver on Intel and AMD systems v2 Don't try to describe the actual models for now. v2: Fix typo: X86_VENDOR_ANY -> X86_FAMILY_ANY (trenn) Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Renninger <trenn@suse.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2012-01-26 16:49:07 -08:00
Thomas Renninger	2f1e097e24	X86: Introduce HW-Pstate scattered cpuid feature It is rather similar to CPB (boot capability) feature and exists since fam10h (can be looked up in AMD's BKDG). The feature is needed for powernow-k8 to cleanup init functions and to provide proper autoloading matching with the new x86cpu modalias feature. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Dave Jones <davej@redhat.com> Cc: Borislav Petkov <bp@amd64.org> Signed-off-by: Thomas Renninger <trenn@suse.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2012-01-26 16:49:06 -08:00
Andi Kleen	644e9cbbe3	Add driver auto probing for x86 features v4 There's a growing number of drivers that support a specific x86 feature or CPU. Currently loading these drivers currently on a generic distribution requires various driver specific hacks and it often doesn't work. This patch adds auto probing for drivers based on the x86 cpuid information, in particular based on vendor/family/model number and also based on CPUID feature bits. For example a common issue is not loading the SSE 4.2 accelerated CRC module: this can significantly lower the performance of BTRFS which relies on fast CRC. Another issue is loading the right CPUFREQ driver for the current CPU. Currently distributions often try all all possible driver until one sticks, which is not really a good way to do this. It works with existing udev without any changes. The code exports the x86 information as a generic string in sysfs that can be matched by udev's pattern matching. This scheme does not support numeric ranges, so if you want to handle e.g. ranges of model numbers they have to be encoded in ASCII or simply all models or families listed. Fixing that would require changing udev. Another issue is that udev will happily load all drivers that match, there is currently no nice way to stop a specific driver from being loaded if it's not needed (e.g. if you don't need fast CRC) But there are not that many cpu specific drivers around and they're all not that bloated, so this isn't a particularly serious issue. Originally this patch added the modalias to the normal cpu sysdevs. However sysdevs don't have all the infrastructure needed for udev, so it couldn't really autoload drivers. This patch instead adds the CPU modaliases to the cpuid devices, which are real devices with full support for udev. This implies that the cpuid driver has to be loaded to use this. This patch just adds infrastructure, some driver conversions in followups. Thanks to Kay for helping with some sysfs magic. v2: Constifcation, some updates v4: (trenn@suse.de): - Use kzalloc instead of kmalloc to terminate modalias buffer - Use uppercase hex values to match correctly against hex values containing letters Cc: Dave Jones <davej@redhat.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Jen Axboe <axboe@kernel.dk> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Huang Ying <ying.huang@intel.com> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Renninger <trenn@suse.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2012-01-26 16:44:41 -08:00
Tony Luck	08dda402d6	x86/mce: Replace hard coded hex constants with symbolic defines Magic constants like 0x0134 in code just invite questions on where they come from, what they mean, can they be changed. Provide #defines for the architecturally defined MCACOD values with a reference to the Intel Software Developers manual which describes them. Signed-off-by: Tony Luck <tony.luck@intel.com>	2012-01-26 16:02:22 -08:00
Prarit Bhargava	b0f4c4b32c	bugs, x86: Fix printk levels for panic, softlockups and stack dumps rsyslog will display KERN_EMERG messages on a connected terminal. However, these messages are useless/undecipherable for a general user. For example, after a softlockup we get: Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ... kernel:Stack: Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ... kernel:Call Trace: Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ... kernel:Code: ff ff a8 08 75 25 31 d2 48 8d 86 38 e0 ff ff 48 89 d1 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e0 0f 01 c9 <e8> ea 69 dd ff 4c 29 e8 48 89 c7 e8 0f bc da ff 49 89 c4 49 89 This happens because the printk levels for these messages are incorrect. Only an informational message should be displayed on a terminal. I modified the printk levels for various messages in the kernel and tested the output by using the drivers/misc/lkdtm.c kernel modules (ie, softlockups, panics, hard lockups, etc.) and confirmed that the console output was still the same and that the output to the terminals was correct. For example, in the case of a softlockup we now see the much more informative: Message from syslogd@intel-s3e37-04 at Jan 25 10:18:06 ... BUG: soft lockup - CPU4 stuck for 60s! instead of the above confusing messages. AFAICT, the messages no longer have to be KERN_EMERG. In the most important case of a panic we set console_verbose(). As for the other less severe cases the correct data is output to the console and /var/log/messages. Successfully tested by me using the drivers/misc/lkdtm.c module. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: dzickus@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1327586134-11926-1-git-send-email-prarit@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2012-01-26 21:28:45 +01:00
Andreas Herrmann	5b68edc91c	x86/microcode_amd: Add support for CPU family specific container files We've decided to provide CPU family specific container files (starting with CPU family 15h). E.g. for family 15h we have to load microcode_amd_fam15h.bin instead of microcode_amd.bin Rationale is that starting with family 15h patch size is larger than 2KB which was hard coded as maximum patch size in various microcode loaders (not just Linux). Container files which include patches larger than 2KB cause different kinds of trouble with such old patch loaders. Thus we have to ensure that the default container file provides only patches with size less than 2KB. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/r/20120120164412.GD24508@alberich.amd.com [ documented the naming convention and tidied the code a bit. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2012-01-26 12:06:39 +01:00
Ingo Molnar	4e9f44ba29	MCE recovery (data path only) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAABAgAGBQJPHy7VAAoJEKurIx+X31iBHcsP/1VcGuxFAL5i/zBqUqhbS7BL s+4or1j3NOcxIePQ9egg1L/sLzD+jmo37ObFMTzFOLwuLeodtJF6e0DXQhR7bMKz UqOS4WAhNxRBtZtUqIbIiMoDG4Vny1atdqxDQKzmV88ulTG2+JE5U6sGjfTdWvX7 gZA6Vj31Dz7p6scPT2j8tnLjFV+XvVJSBp/2rgi2Nw81UzBeIRZRiWZrBMLemPCU T82OEffnIpSdn60sktMN/ht99yGQO31zT0c+/72Z0ysZAPlTjFbW7CZJHPZmLIVB tPkoTRFOf4iwjy2pZNzs9bB8ord/As3IyTxAsfYUin4N2bX27n058uTQ3CqbgEz+ pa6C5N0ZrV9plYa9BbgCHmNIkhEONIb3WtH27uh/hZOztDA2CXzPT5mm4FOzmrJ7 DGVBqmXth6g2jYJNT/K2QgmVMZM0CeXQnoDJP54sXzv7F4dEM5P64Lz6E1kCd5Jf x9O1orDnEVXssgEPVtF/eEjIQK/vF7s1BUUlMBZJwdAyTwCiD8RvueG87bApnA2z eO8VS62akqjpDt5sHboAGJrjcuhqnkbgtG2dn0EqONzk8DJPnhFXVLmSbvH+KuTC OguH2LC5N7n9Wjr5a9Duw2DdIj8njvzFrKVzo/l6r3m99u/Jby54vGk2cPLwfGvp /9Y+SK2Ou6LSbPiRU4dP =ofSb -----END PGP SIGNATURE----- Merge tag 'mce-recovery-for-tip' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/mce Implement MCE recovery for the data load error path and assorted cleanups. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2012-01-26 11:40:13 +01:00
H. Peter Anvin	282f445a77	Merge remote-tracking branch 'linus/master' into x86/urgent	2012-01-19 12:56:50 -08:00
Linus Torvalds	507a03c1cb	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux This includes initial support for the recently published ACPI 5.0 spec. In particular, support for the "hardware-reduced" bit that eliminates the dependency on legacy hardware. APEI has patches resulting from testing on real hardware. Plus other random fixes. * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (52 commits) acpi/apei/einj: Add extensions to EINJ from rev 5.0 of acpi spec intel_idle: Split up and provide per CPU initialization func ACPI processor: Remove unneeded variable passed by acpi_processor_hotadd_init V2 ACPI processor: Remove unneeded cpuidle_unregister_driver call intel idle: Make idle driver more robust intel_idle: Fix a cast to pointer from integer of different size warning in intel_idle ACPI: kernel-parameters.txt : Add intel_idle.max_cstate intel_idle: remove redundant local_irq_disable() call ACPI processor: Fix error path, also remove sysdev link ACPI: processor: fix acpi_get_cpuid for UP processor intel_idle: fix API misuse ACPI APEI: Convert atomicio routines ACPI: Export interfaces for ioremapping/iounmapping ACPI registers ACPI: Fix possible alignment issues with GAS 'address' references ACPI, ia64: Use SRAT table rev to use 8bit or 16/32bit PXM fields (ia64) ACPI, x86: Use SRAT table rev to use 8bit or 32bit PXM fields (x86/x86-64) ACPI: Store SRAT table revision ACPI, APEI, Resolve false conflict between ACPI NVS and APEI ACPI, Record ACPI NVS regions ACPI, APEI, EINJ, Refine the fix of resource conflict ...	2012-01-18 15:51:48 -08:00
Al Viro	6015ff1031	x86-32: Fix build failure with AUDIT=y, AUDITSYSCALL=n JONGMAN HEO reports: With current linus git (commit `a25a2b84`), I got following build error, arch/x86/kernel/vm86_32.c: In function 'do_sys_vm86': arch/x86/kernel/vm86_32.c:340: error: implicit declaration of function '__audit_syscall_exit' make[3]: *** [arch/x86/kernel/vm86_32.o] Error 1 OK, I can reproduce it (32bit allmodconfig with AUDIT=y, AUDITSYSCALL=n) It's due to commit `d7e7528bcd`: "Audit: push audit success and retcode into arch ptrace.h". Reported-by: JONGMAN HEO <jongman.heo@samsung.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-17 18:10:11 -08:00
Linus Torvalds	f429ee3b80	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit: (29 commits) audit: no leading space in audit_log_d_path prefix audit: treat s_id as an untrusted string audit: fix signedness bug in audit_log_execve_info() audit: comparison on interprocess fields audit: implement all object interfield comparisons audit: allow interfield comparison between gid and ogid audit: complex interfield comparison helper audit: allow interfield comparison in audit rules Kernel: Audit Support For The ARM Platform audit: do not call audit_getname on error audit: only allow tasks to set their loginuid if it is -1 audit: remove task argument to audit_set_loginuid audit: allow audit matching on inode gid audit: allow matching on obj_uid audit: remove audit_finish_fork as it can't be called audit: reject entry,always rules audit: inline audit_free to simplify the look of generic code audit: drop audit_set_macxattr as it doesn't do anything audit: inline checks for not needing to collect aux records audit: drop some potentially inadvisable likely notations ... Use evil merge to fix up grammar mistakes in Kconfig file. Bad speling and horrible grammar (and copious swearing) is to be expected, but let's keep it to commit messages and comments, rather than expose it to users in config help texts or printouts.	2012-01-17 16:41:31 -08:00
Linus Torvalds	68f30fbee1	x86, tsc: Fix SMI induced variation in quick_pit_calibrate() pit_expect_msb() returns success wrongly in the below SMI scenario: a. pit_verify_msb() has not yet seen the MSB transition. b. we are close to the MSB transition though and got a SMI immediately after returning from pit_verify_msb() which didn't see the MSB transition. PIT MSB transition has happened somewhere during SMI execution. c. returned from SMI and we noted down the 'tsc', saw the pit MSB change now and exited the loop to calculate 'deltatsc'. Instead of noting the TSC at the MSB transition, we are way off because of the SMI. And as the SMI happened between the pit_verify_msb() and before the 'tsc' is recorded in the for loop, 'delattsc' (d1/d2 in quick_pit_calibrate()) will be small and quick_pit_calibrate() will not notice this error. Depending on whether SMI disturbance happens while computing d1 or d2, we will see the TSC calibrated value smaller or bigger than the expected value. As a result, in a cluster we were seeing a variation of approximately +/- 20MHz in the calibrated values, resulting in NTP failures. [ As far as the SMI source is concerned, this is a periodic SMI that gets disabled after ACPI is enabled by the OS. But the TSC calibration happens before the ACPI is enabled. ] To address this, change pit_expect_msb() so that - the 'tsc' is the TSC in between the two reads that read the MSB change from the PIT (same as before) - the 'delta' is the difference in TSC from before the MSB changed to after the MSB changed. Now the delta is twice as big as before (it covers four PIT accesses, roughly 4us) and quick_pit_calibrate() will loop a bit longer to get the calibrated value with in the 500ppm precision. As the delta (d1/d2) covers four PIT accesses, actual calibrated result might be closer to 250ppm precision. As the loop now takes longer to stabilize, double MAX_QUICK_PIT_MS to 50. SMI disturbance will showup as much larger delta's and the loop will take longer than usual for the result to be with in the accepted precision. Or will fallback to slow PIT calibration if it takes more than 50msec. Also while we are at this, remove the calibration correction that aims to get the result to the middle of the error bars. We really don't know which direction to correct into, so remove it. Reported-and-tested-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1326843337.5291.4.camel@sbsiddha-mobl2 Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-01-17 15:46:51 -08:00
Eric Paris	b05d8447e7	audit: inline audit_syscall_entry to reduce burden on archs Every arch calls: if (unlikely(current->audit_context)) audit_syscall_entry() which requires knowledge about audit (the existance of audit_context) in the arch code. Just do it all in static inline in audit.h so that arch's can remain blissfully ignorant. Signed-off-by: Eric Paris <eparis@redhat.com>	2012-01-17 16:16:56 -05:00
Eric Paris	d7e7528bcd	Audit: push audit success and retcode into arch ptrace.h The audit system previously expected arches calling to audit_syscall_exit to supply as arguments if the syscall was a success and what the return code was. Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things by converting from negative retcodes to an audit internal magic value stating success or failure. This helper was wrong and could indicate that a valid pointer returned to userspace was a failed syscall. The fix is to fix the layering foolishness. We now pass audit_syscall_exit a struct pt_reg and it in turns calls back into arch code to collect the return value and to determine if the syscall was a success or failure. We also define a generic is_syscall_success() macro which determines success/failure based on if the value is < -MAX_ERRNO. This works for arches like x86 which do not use a separate mechanism to indicate syscall failure. We make both the is_syscall_success() and regs_return_value() static inlines instead of macros. The reason is because the audit function must take a void* for the regs. (uml calls theirs struct uml_pt_regs instead of just struct pt_regs so audit_syscall_exit can't take a struct pt_regs). Since the audit function takes a void* we need to use static inlines to cast it back to the arch correct structure to dereference it. The other major change is that on some arches, like ia64, MIPS and ppc, we change regs_return_value() to give us the negative value on syscall failure. THE only other user of this macro, kretprobe_example.c, won't notice and it makes the value signed consistently for the audit functions across all archs. In arch/sh/kernel/ptrace_64.c I see that we were using regs[9] in the old audit code as the return value. But the ptrace_64.h code defined the macro regs_return_value() as regs[3]. I have no idea which one is correct, but this patch now uses the regs_return_value() function, so it now uses regs[3]. For powerpc we previously used regs->result but now use the regs_return_value() function which uses regs->gprs[3]. regs->gprs[3] is always positive so the regs_return_value(), much like ia64 makes it negative before calling the audit code when appropriate. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: H. Peter Anvin <hpa@zytor.com> [for x86 portion] Acked-by: Tony Luck <tony.luck@intel.com> [for ia64] Acked-by: Richard Weinberger <richard@nod.at> [for uml] Acked-by: David S. Miller <davem@davemloft.net> [for sparc] Acked-by: Ralf Baechle <ralf@linux-mips.org> [for mips] Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [for ppc]	2012-01-17 16:16:56 -05:00
Huang Ying	b54ac6d2a2	ACPI, Record ACPI NVS regions Some firmware will access memory in ACPI NVS region via APEI. That is, instructions in APEI ERST/EINJ table will read/write ACPI NVS region. The original resource conflict checking in APEI code will check memory/ioport accessed by APEI via general resource management mechanism. But ACPI NVS region is marked as busy already, so that the false resource conflict will prevent APEI ERST/EINJ to work. To fix this, this patch record ACPI NVS regions, so that we can avoid request resources for memory region inside it. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2012-01-17 03:54:44 -05:00
Linus Torvalds	5674124f9f	Merge branch 'x86-syscall-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-syscall-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Move <asm/asm-offsets.h> from trace_syscalls.c to asm/syscall.h x86, um: Fix typo in 32-bit system call modifications um: Use $(srctree) not $(KBUILD_SRC) x86, um: Mark system call tables readonly x86, um: Use the same style generated syscall tables as native um: Generate headers before generating user-offsets.s um: Run host archheaders, allow use of host generated headers kbuild, headers.sh: Don't make archheaders explicitly x86, syscall: Allow syscall offset to be symbolic x86, syscall: Re-fix typo in comment x86: Simplify syscallhdr.sh x86: Generate system call tables and unistd_*.h from tables checksyscalls: Use arch/x86/syscalls/syscall_32.tbl as source x86: Machine-readable syscall tables and scripts to process them trace: Include <asm/asm-offsets.h> in trace_syscalls.c x86-64, ia32: Move compat_ni_syscall into C and its own file x86-64, syscall: Adjust comment spacing and remove typo kbuild: Add support for an "archheaders" target kbuild: Add support for installing generated asm headers	2012-01-16 18:19:19 -08:00
Greg Kroah-Hartman	e032d80774	mce: fix warning messages about static struct mce_device When suspending, there was a large list of warnings going something like: Device 'machinecheck1' does not have a release() function, it is broken and must be fixed This patch turns the static mce_devices into dynamically allocated, and properly frees them when they are removed from the system. It solves the warning messages on my laptop here. Reported-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Djalal Harouni <tixxdz@opendz.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@amd64.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-16 17:08:42 -08:00
Linus Torvalds	83c2f912b4	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) perf tools: Fix compile error on x86_64 Ubuntu perf report: Fix --stdio output alignment when --showcpuutilization used perf annotate: Get rid of field_sep check perf annotate: Fix usage string perf kmem: Fix a memory leak perf kmem: Add missing closedir() calls perf top: Add error message for EMFILE perf test: Change type of '-v' option to INCR perf script: Add missing closedir() calls tracing: Fix compile error when static ftrace is enabled recordmcount: Fix handling of elf64 big-endian objects. perf tools: Add const.h to MANIFEST to make perf-tar-src-pkg work again perf tools: Add support for guest/host-only profiling perf kvm: Do guest-only counting by default perf top: Don't update total_period on process_sample perf hists: Stop using 'self' for struct hist_entry perf hists: Rename total_session to total_period x86: Add counter when debug stack is used with interrupts enabled x86: Allow NMIs to hit breakpoints in i386 x86: Keep current stack in NMI breakpoints ...	2012-01-15 11:26:35 -08:00
Linus Torvalds	f0ed5b9a28	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, atomic: atomic64_read() take a const pointer x86, UV: Update Boot messages for SGI UV2 platform	2012-01-15 11:26:09 -08:00
Linus Torvalds	0a80939b3e	Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPD2aFAAoJENkgDmzRrbjxNzsQAIeYbbrXYLjr6kQzUSngj/eC FzjaTEfYTQIeuQCFJHcHthyc5lXV4sQbo3jOezW+Bp5yuDJL2aWIHesSfWZe7imu zQdM4VshOYdAmUR9Q0AW5zhB8Smbs7/AyABiF2jm4p0ZPOuyMDSlei9sjvE9Vjvt B7g5ht7L6kz0JbDnwwy0u5gs+tEitwpXYId9Y4ysZIBzIbL0qkPX8veOddGTMy0N 8xhWXaKtufpjvxFD2ORLDsw3AkoF1xXSNuFd/5nzCNpbeE7TW931jfkPoqJumuAO 7GLxcU9kKYl+IICobC6wBtsj/RrB7w+cBXMvPGwdBliam1qaRhUcJZi5FLM/Ha5d 2A9QDYNUpoXiO8JbPXrV9Z+Y0+Co8RilsQj7R/rjZh6AbbYCWt9nxzx2Svl/RfTr xfiimHuB2P3rHjOvpCXULwOOuE5c8MzPuWncpdjiD3uGXOY/aY+X1m+if/quJw9D grPlKL0+YiRakEYUeGG4M77KCqyKFZaF7L7UQPbqfZcj8V/9AW3/7U5I/B9RlAjs idsr4fcf5s0N+oKUyTCW1ncpUDQNiwbU2NyJQqeu1ZxaRGj72AgyvsaNeyIPDyK+ f6x95Bi7i8KLjXc9Z1KvJwh2Nxt25gNUiTYVha/9H2NpJGd1cfI15kTOGXrgddVv 1pvuGcJDZwYiwfiXr3FL =HHrh -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://github.com/rustyrussell/linux Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1 * tag 'for-linus' of git://github.com/rustyrussell/linux: module_param: check that bool parameters really are bool. intelfbdrv.c: bailearly is an int module_param paride/pcd: fix bool verbose module parameter. module_param: make bool parameters really bool (drivers & misc) module_param: make bool parameters really bool (arch) module_param: make bool parameters really bool (core code) kernel/async: remove redundant declaration. printk: fix unnecessary module_param_name. lirc_parallel: fix module parameter description. module_param: avoid bool abuse, add bint for special cases. module_param: check type correctness for module_param_array modpost: use linker section to generate table. modpost: use a table rather than a giant if/else statement. modules: sysfs - export: taint, coresize, initsize kernel/params: replace DEBUGP with pr_debug module: replace DEBUGP with pr_debug module: struct module_ref should contains long fields module: Fix performance regression on modules with large symbol tables module: Add comments describing how the "strmap" logic works Fix up conflicts in scripts/mod/file2alias.c due to the new linker- generated table approach to adding __mod_*_device_table entries. The ARM sa11x0 mcp bus needed to be converted to that too.	2012-01-14 12:32:16 -08:00
Srivatsa S. Bhat	a3301b751b	x86/mce: Fix CPU hotplug and suspend regression related to MCE Commit `8a25a2fd12` ("cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem") changed how things are dealt with in the MCE subsystem. Some of the things that got broken due to this are CPU hotplug and suspend/hibernate. MCE uses per_cpu allocations of struct device. So, when a CPU goes offline and comes back online, in order to ensure that we start from a clean slate with respect to the MCE subsystem, zero out the entire per_cpu device structure to 0 before using it. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-13 19:11:35 -08:00
Rusty Russell	476bc0015b	module_param: make bool parameters really bool (arch) module_param(bool) used to counter-intuitively take an int. In `fddd5201` (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-01-13 09:32:18 +10:30
Linus Torvalds	9fc5c3e323	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel config: Fix the APB_TIMER selection x86/mrst: Add additional debug prints for pb_keys x86/intel config: Revamp configuration to allow for Moorestown and Medfield x86/intel/scu/ipc: Match the changes in the x86 configuration x86/apb: Fix configuration constraints x86: Fix INTEL_MID silly x86/Kconfig: Cyclone-timer depends on x86-summit x86: Reduce clock calibration time during slave cpu startup x86/config: Revamp configuration for MID devices x86/sfi: Kill the IRQ as id hack	2012-01-11 19:13:40 -08:00
Linus Torvalds	541048a1d3	Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, reboot: Fix typo in nmi reboot path x86, NMI: Add to_cpumask() to silence compile warning x86, NMI: NMI selftest depends on the local apic x86: Add stack top margin for stack overflow checking x86, NMI: NMI-selftest should handle the UP case properly x86: Fix the 32-bit stackoverflow-debug build x86, NMI: Add knob to disable using NMI IPIs to stop cpus x86, NMI: Add NMI IPI selftest x86, reboot: Use NMI instead of REBOOT_VECTOR to stop cpus x86: Clean up the range of stack overflow checking x86: Panic on detection of stack overflow x86: Check stack overflow in detail	2012-01-11 19:13:04 -08:00
Linus Torvalds	bcede2f64a	Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, efi: Break up large initrd reads x86, efi: EFI boot stub support efi: Add EFI file I/O data types efi.h: Add boottime->locate_handle search types efi.h: Add graphics protocol guids efi.h: Add allocation types for boottime->allocate_pages() efi.h: Add efi_image_loaded_t efi.h: Add struct definition for boot time services x86: Don't use magic strings for EFI loader signature x86: Add missing bzImage fields to struct setup_header	2012-01-11 19:12:33 -08:00
Linus Torvalds	d0b9706c20	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/numa: Add constraints check for nid parameters mm, x86: Remove debug_pagealloc_enabled x86/mm: Initialize high mem before free_all_bootmem() arch/x86/kernel/e820.c: quiet sparse noise about plain integer as NULL pointer arch/x86/kernel/e820.c: Eliminate bubble sort from sanitize_e820_map() x86: Fix mmap random address range x86, mm: Unify zone_sizes_init() x86, mm: Prepare zone_sizes_init() for unification x86, mm: Use max_low_pfn for ZONE_NORMAL on 64-bit x86, mm: Wrap ZONE_DMA32 with CONFIG_ZONE_DMA32 x86, mm: Use max_pfn instead of highend_pfn x86, mm: Move zone init from paging_init() on 64-bit x86, mm: Use MAX_DMA_PFN for ZONE_DMA on 32-bit	2012-01-11 19:12:10 -08:00
Linus Torvalds	7b67e75147	Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci: (80 commits) x86/PCI: Expand the x86_msi_ops to have a restore MSIs. PCI: Increase resource array mask bit size in pcim_iomap_regions() PCI: DEVICE_COUNT_RESOURCE should be equal to PCI_NUM_RESOURCES PCI: pci_ids: add device ids for STA2X11 device (aka ConneXT) PNP: work around Dell 1536/1546 BIOS MMCONFIG bug that breaks USB x86/PCI: amd: factor out MMCONFIG discovery PCI: Enable ATS at the device state restore PCI: msi: fix imbalanced refcount of msi irq sysfs objects PCI: kconfig: English typo in pci/pcie/Kconfig PCI/PM/Runtime: make PCI traces quieter PCI: remove pci_create_bus() xtensa/PCI: convert to pci_scan_root_bus() for correct root bus resources x86/PCI: convert to pci_create_root_bus() and pci_scan_root_bus() x86/PCI: use pci_scan_bus() instead of pci_scan_bus_parented() x86/PCI: read Broadcom CNB20LE host bridge info before PCI scan sparc32, leon/PCI: convert to pci_scan_root_bus() for correct root bus resources sparc/PCI: convert to pci_create_root_bus() sh/PCI: convert to pci_scan_root_bus() for correct root bus resources powerpc/PCI: convert to pci_create_root_bus() powerpc/PCI: split PHB part out of pcibios_map_io_space() ... Fix up conflicts in drivers/pci/msi.c and include/linux/pci_regs.h due to the same patches being applied in other branches.	2012-01-11 18:50:26 -08:00
Linus Torvalds	40ba587923	Merge branch 'akpm' (aka "Andrew's patch-bomb") Andrew elucidates: - First installmeant of MM. We have a HUGE number of MM patches this time. It's crazy. - MAINTAINERS updates - backlight updates - leds - checkpatch updates - misc ELF stuff - rtc updates - reiserfs - procfs - some misc other bits * akpm: (124 commits) user namespace: make signal.c respect user namespaces workqueue: make alloc_workqueue() take printf fmt and args for name procfs: add hidepid= and gid= mount options procfs: parse mount options procfs: introduce the /proc/<pid>/map_files/ directory procfs: make proc_get_link to use dentry instead of inode signal: add block_sigmask() for adding sigmask to current->blocked sparc: make SA_NOMASK a synonym of SA_NODEFER reiserfs: don't lock root inode searching reiserfs: don't lock journal_init() reiserfs: delay reiserfs lock until journal initialization reiserfs: delete comments referring to the BKL drivers/rtc/interface.c: fix alarm rollover when day or month is out-of-range drivers/rtc/rtc-twl.c: add DT support for RTC inside twl4030/twl6030 drivers/rtc/: remove redundant spi driver bus initialization drivers/rtc/rtc-jz4740.c: make jz4740_rtc_driver static drivers/rtc/rtc-mc13xxx.c: make mc13xxx_rtc_idtable static rtc: convert drivers/rtc/* to use module_platform_driver() drivers/rtc/rtc-wm831x.c: convert to devm_kzalloc() drivers/rtc/rtc-wm831x.c: remove unused period IRQ handler ...	2012-01-10 16:42:48 -08:00
Matt Fleming	5e6292c0f2	signal: add block_sigmask() for adding sigmask to current->blocked Abstract the code sequence for adding a signal handler's sa_mask to current->blocked because the sequence is identical for all architectures. Furthermore, in the past some architectures actually got this code wrong, so introduce a wrapper that all architectures can use. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-10 16:30:54 -08:00
Linus Torvalds	1c8106528a	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (53 commits) iommu/amd: Set IOTLB invalidation timeout iommu/amd: Init stats for iommu=pt iommu/amd: Remove unnecessary cache flushes in amd_iommu_resume iommu/amd: Add invalidate-context call-back iommu/amd: Add amd_iommu_device_info() function iommu/amd: Adapt IOMMU driver to PCI register name changes iommu/amd: Add invalid_ppr callback iommu/amd: Implement notifiers for IOMMUv2 iommu/amd: Implement IO page-fault handler iommu/amd: Add routines to bind/unbind a pasid iommu/amd: Implement device aquisition code for IOMMUv2 iommu/amd: Add driver stub for AMD IOMMUv2 support iommu/amd: Add stat counter for IOMMUv2 events iommu/amd: Add device errata handling iommu/amd: Add function to get IOMMUv2 domain for pdev iommu/amd: Implement function to send PPR completions iommu/amd: Implement functions to manage GCR3 table iommu/amd: Implement IOMMUv2 TLB flushing routines iommu/amd: Add support for IOMMUv2 domain mode iommu/amd: Add amd_iommu_domain_direct_map function ...	2012-01-10 11:08:21 -08:00
Linus Torvalds	3dcf6c1b6b	Merge branch 'kvm-updates/3.3' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/3.3' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (74 commits) KVM: PPC: Whitespace fix for kvm.h KVM: Fix whitespace in kvm_para.h KVM: PPC: annotate kvm_rma_init as __init KVM: x86 emulator: implement RDPMC (0F 33) KVM: x86 emulator: fix RDPMC privilege check KVM: Expose the architectural performance monitoring CPUID leaf KVM: VMX: Intercept RDPMC KVM: SVM: Intercept RDPMC KVM: Add generic RDPMC support KVM: Expose a version 2 architectural PMU to a guests KVM: Expose kvm_lapic_local_deliver() KVM: x86 emulator: Use opcode::execute for Group 9 instruction KVM: x86 emulator: Use opcode::execute for Group 4/5 instructions KVM: x86 emulator: Use opcode::execute for Group 1A instruction KVM: ensure that debugfs entries have been created KVM: drop bsp_vcpu pointer from kvm struct KVM: x86: Consolidate PIT legacy test KVM: x86: Do not rely on implicit inclusions KVM: Make KVM_INTEL depend on CPU_SUP_INTEL KVM: Use memdup_user instead of kmalloc/copy_from_user ...	2012-01-10 09:57:11 -08:00
Linus Torvalds	5983faf942	Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty * 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (65 commits) tty: serial: imx: move del_timer_sync() to avoid potential deadlock imx: add polled io uart methods imx: Add save/restore functions for UART control regs serial/imx: let probing fail for the dt case without a valid alias serial/imx: propagate error from of_alias_get_id instead of using -ENODEV tty: serial: imx: Allow UART to be a source for wakeup serial: driver for m32 arch should not have DEC alpha errata serial/documentation: fix documented name of DCD cpp symbol atmel_serial: fix spinlock lockup in RS485 code tty: Fix memory leak in virtual console when enable unicode translation serial: use DIV_ROUND_CLOSEST instead of open coding it serial: add support for 400 and 800 v3 series Titan cards serial: bfin-uart: Remove ASYNC_CTS_FLOW flag for hardware automatic CTS. serial: bfin-uart: Enable hardware automatic CTS only when CTS pin is available. serial: make FSL errata depend on 8250_CONSOLE, not just 8250 serial: add irq handler for Freescale 16550 errata. serial: manually inline serial8250_handle_port serial: make 8250 timeout use the specified IRQ handler serial: export the key functions for an 8250 IRQ handler serial: clean up parameter passing for 8250 Rx IRQ handling ...	2012-01-09 12:09:24 -08:00
Joerg Roedel	f93ea73387	Merge branches 'iommu/page-sizes' and 'iommu/group-id' into next Conflicts: drivers/iommu/amd_iommu.c drivers/iommu/intel-iommu.c include/linux/iommu.h	2012-01-09 13:06:28 +01:00
Linus Torvalds	972b2c7199	Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits) reiserfs: Properly display mount options in /proc/mounts vfs: prevent remount read-only if pending removes vfs: count unlinked inodes vfs: protect remounting superblock read-only vfs: keep list of mounts for each superblock vfs: switch ->show_options() to struct dentry * vfs: switch ->show_path() to struct dentry * vfs: switch ->show_devname() to struct dentry * vfs: switch ->show_stats to struct dentry * switch security_path_chmod() to struct path * vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb vfs: trim includes a bit switch mnt_namespace ->root to struct mount vfs: take /proc//mounts and friends to fs/proc_namespace.c vfs: opencode mntget() mnt_set_mountpoint() vfs: spread struct mount - remaining argument of next_mnt() vfs: move fsnotify junk to struct mount vfs: move mnt_devname vfs: move mnt_list to struct mount vfs: switch pnode.h macros to struct mount ...	2012-01-08 12:19:57 -08:00
Jack Steiner	da517a08ac	x86, UV: Update Boot messages for SGI UV2 platform SGI UV systems print a message during boot: UV: Found <num> blades Due to packaging changes, the blade count is not accurate for on the next generation of the platform. This patch corrects the count. Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/r/20120106191900.GA19772@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2012-01-08 12:35:44 +01:00
Ingo Molnar	636f0c70f2	Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core	2012-01-08 12:31:24 +01:00
Linus Torvalds	7affca3537	Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits) arm: fix up some samsung merge sysdev conversion problems firmware: Fix an oops on reading fw_priv->fw in sysfs loading file Drivers:hv: Fix a bug in vmbus_driver_unregister() driver core: remove __must_check from device_create_file debugfs: add missing #ifdef HAS_IOMEM arm: time.h: remove device.h #include driver-core: remove sysdev.h usage. clockevents: remove sysdev.h arm: convert sysdev_class to a regular subsystem arm: leds: convert sysdev_class to a regular subsystem kobject: remove kset_find_obj_hinted() m86k: gpio - convert sysdev_class to a regular subsystem mips: txx9_sram - convert sysdev_class to a regular subsystem mips: 7segled - convert sysdev_class to a regular subsystem sh: dma - convert sysdev_class to a regular subsystem sh: intc - convert sysdev_class to a regular subsystem power: suspend - convert sysdev_class to a regular subsystem power: qe_ic - convert sysdev_class to a regular subsystem power: cmm - convert sysdev_class to a regular subsystem s390: time - convert sysdev_class to a regular subsystem ... Fix up conflicts with 'struct sysdev' removal from various platform drivers that got changed: - arch/arm/mach-exynos/cpu.c - arch/arm/mach-exynos/irq-eint.c - arch/arm/mach-s3c64xx/common.c - arch/arm/mach-s3c64xx/cpu.c - arch/arm/mach-s5p64x0/cpu.c - arch/arm/mach-s5pv210/common.c - arch/arm/plat-samsung/include/plat/cpu.h - arch/powerpc/kernel/sysfs.c and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h	2012-01-07 12:03:30 -08:00
Ingo Molnar	03f70388c3	Merge branch 'tip/x86/core-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core	2012-01-07 13:25:49 +01:00
Don Zickus	e58d429209	x86, reboot: Fix typo in nmi reboot path It was brought to my attention that my x86 change to use NMI in the reboot path broke Intel Nehalem and Westmere boxes when using kexec. I realized I had mistyped the if statement in commit `3603a2512f` and stuck the ')' in the wrong spot. Putting it in the right spot fixes kexec again. Doh. Reported-by: Yinghai Lu <yinghai@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/r/1325866671-9797-1-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2012-01-07 12:19:37 +01:00
Linus Torvalds	edf7c8148e	Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: add IRQ context simulation in module mce-inject x86, mce, therm_throt: Don't report power limit and package level thermal throttle events in mcelog x86, MCE: Drain mcelog buffer x86, mce: Add wrappers for registering on the decode chain	2012-01-06 15:02:37 -08:00
Linus Torvalds	82406da4a6	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode, AMD: Update copyrights x86, microcode, AMD: Exit early on success x86, microcode, AMD: Simplify ucode verification x86, microcode, AMD: Add a reusable buffer x86, microcode, AMD: Add a vendor-specific exit function	2012-01-06 15:02:14 -08:00
Konrad Rzeszutek Wilk	76ccc29701	x86/PCI: Expand the x86_msi_ops to have a restore MSIs. The MSI restore function will become a function pointer in an x86_msi_ops struct. It defaults to the implementation in the io_apic.c and msi.c. We piggyback on the indirection mechanism introduced by "x86: Introduce x86_msi_ops". Cc: x86@kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2012-01-06 14:02:26 -08:00
Linus Torvalds	2c364faabb	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, centaur: Enable cx8 for VIA Eden too	2012-01-06 14:00:44 -08:00
Linus Torvalds	cf3f33551b	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Use "do { } while(0)" for empty lock_cmos()/unlock_cmos() macros x86: Use "do { } while(0)" for empty flush_tlb_fix_spurious_fault() macro x86, CPU: Drop superfluous get_cpu_cap() prototype arch/x86/mm/pageattr.c: Quiet sparse noise; local functions should be static arch/x86/kernel/ptrace.c: Quiet sparse noise x86: Use kmemdup() in copy_thread(), rather than duplicating its implementation x86: Replace the EVT_TO_HPET_DEV() macro with an inline function	2012-01-06 14:00:12 -08:00
Linus Torvalds	69734b644b	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86: Fix atomic64_xxx_cx8() functions x86: Fix and improve cmpxchg_double{,_local}() x86_64, asm: Optimise fls(), ffs() and fls64() x86, bitops: Move fls64.h inside __KERNEL__ x86: Fix and improve percpu_cmpxchg{8,16}b_double() x86: Report cpb and eff_freq_ro flags correctly x86/i386: Use less assembly in strlen(), speed things up a bit x86: Use the same node_distance for 32 and 64-bit x86: Fix rflags in FAKE_STACK_FRAME x86: Clean up and extend do_int3() x86: Call do_notify_resume() with interrupts enabled x86/div64: Add a micro-optimization shortcut if base is power of two x86-64: Cleanup some assembly entry points x86-64: Slightly shorten line system call entry and exit paths x86-64: Reduce amount of redundant code generated for invalidate_interruptNN x86-64: Slightly shorten int_ret_from_sys_call x86, efi: Convert efi_phys_get_time() args to physical addresses x86: Default to vsyscall=emulate x86-64: Set siginfo and context on vsyscall emulation faults x86: consolidate xchg and xadd macros ...	2012-01-06 13:59:14 -08:00
Linus Torvalds	67b0243131	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Skip cpus with apic-ids >= 255 in !x2apic_mode x86, x2apic: Allow "nox2apic" to disable x2apic mode setup by BIOS x86, x2apic: Fallback to xapic when BIOS doesn't setup interrupt-remapping x86, acpi: Skip acpi x2apic entries if the x2apic feature is not present x86, apic: Add probe() for apic_flat x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86' x86: Convert per-cpu counter icr_read_retry_count into a member of irq_stat x86: Add per-cpu stat counter for APIC ICR read tries pci, x86/io-apic: Allow PCI_IOAPIC to be user configurable on x86 x86: Fix the !CONFIG_NUMA build of the new CPU ID fixup code support x86: Add NumaChip support x86: Add x86_init platform override to fix up NUMA core numbering x86: Make flat_init_apic_ldr() available	2012-01-06 13:58:21 -08:00
Linus Torvalds	376613e81d	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, tsc: Skip TSC synchronization checks for tsc=reliable clocksource: Convert tcb_clksrc to use clocksource_register_hz/khz clocksource: cris: Convert to clocksource_register_khz clocksource: xtensa: Convert to clocksource_register_hz/khz clocksource: um: Convert to clocksource_register_hz/khz clocksource: parisc: Convert to clocksource_register_hz/khz clocksource: m86k: Convert to clocksource_register_hz/khz time: x86: Replace LATCH with PIT_LATCH in i8253 clocksource driver time: x86: Remove CLOCK_TICK_RATE from acpi_pm clocksource driver time: x86: Remove CLOCK_TICK_RATE from mach_timer.h time: x86: Remove CLOCK_TICK_RATE from tsc code time: Fix spelling mistakes in new comments time: fix bogus comment in timekeeping_get_ns_raw	2012-01-06 13:57:44 -08:00
Bjorn Helgaas	24d25dbfa6	x86/PCI: amd: factor out MMCONFIG discovery This factors out the AMD native MMCONFIG discovery so we can use it outside amd_bus.c. amd_bus.c reads AMD MSRs so it can remove the MMCONFIG area from the PCI resources. We may also need the MMCONFIG information to work around BIOS defects in the ACPI MCFG table. Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: stable@kernel.org # 2.6.34+ Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2012-01-06 12:11:19 -08:00
Greg Kroah-Hartman	ff4b8a57f0	Merge branch 'driver-core-next' into Linux 3.2 This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file, and it fixes the build error in the arch/x86/kernel/microcode_core.c file, that the merge did not catch. The microcode_core.c patch was provided by Stephen Rothwell <sfr@canb.auug.org.au> who was invaluable in the merge issues involved with the large sysdev removal process in the driver-core tree. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2012-01-06 11:42:52 -08:00
Linus Torvalds	35b740e466	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (106 commits) perf kvm: Fix copy & paste error in description perf script: Kill script_spec__delete perf top: Fix a memory leak perf stat: Introduce get_ratio_color() helper perf session: Remove impossible condition check perf tools: Fix feature-bits rework fallout, remove unused variable perf script: Add generic perl handler to process events perf tools: Use for_each_set_bit() to iterate over feature flags perf tools: Unify handling of features when writing feature section perf report: Accept fifos as input file perf tools: Moving code in some files perf tools: Fix out-of-bound access to struct perf_session perf tools: Continue processing header on unknown features perf tools: Improve macros for struct feature_ops perf: builtin-record: Document and check that mmap_pages must be a power of two. perf: builtin-record: Provide advice if mmap'ing fails with EPERM. perf tools: Fix truncated annotation perf script: look up thread using tid instead of pid perf tools: Look up thread names for system wide profiling perf tools: Fix comm for processes with named threads ...	2012-01-06 08:02:58 -08:00
Linus Torvalds	423d091dfe	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) cpu: Export cpu_up() rcu: Apply ACCESS_ONCE() to rcu_boost() return value Revert "rcu: Permit rt_mutex_unlock() with irqs disabled" docs: Additional LWN links to RCU API rcu: Augment rcu_batch_end tracing for idle and callback state rcu: Add rcutorture tests for srcu_read_lock_raw() rcu: Make rcutorture test for hotpluggability before offlining CPUs driver-core/cpu: Expose hotpluggability to the rest of the kernel rcu: Remove redundant rcu_cpu_stall_suppress declaration rcu: Adaptive dyntick-idle preparation rcu: Keep invoking callbacks if CPU otherwise idle rcu: Irq nesting is always 0 on rcu_enter_idle_common rcu: Don't check irq nesting from rcu idle entry/exit rcu: Permit dyntick-idle with callbacks pending rcu: Document same-context read-side constraints rcu: Identify dyntick-idle CPUs on first force_quiescent_state() pass rcu: Remove dynticks false positives and RCU failures rcu: Reduce latency of rcu_prepare_for_idle() rcu: Eliminate RCU_FAST_NO_HZ grace-period hang rcu: Avoid needlessly IPIing CPUs at GP end ...	2012-01-06 08:02:40 -08:00
Linus Torvalds	4a2164a7db	Merge branch 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) memblock: Reimplement memblock allocation using reverse free area iterator memblock: Kill early_node_map[] score: Use HAVE_MEMBLOCK_NODE_MAP s390: Use HAVE_MEMBLOCK_NODE_MAP mips: Use HAVE_MEMBLOCK_NODE_MAP ia64: Use HAVE_MEMBLOCK_NODE_MAP SuperH: Use HAVE_MEMBLOCK_NODE_MAP sparc: Use HAVE_MEMBLOCK_NODE_MAP powerpc: Use HAVE_MEMBLOCK_NODE_MAP memblock: Implement memblock_add_node() memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users memblock: Track total size of regions automatically powerpc: Cleanup memblock usage memblock: Reimplement memblock_enforce_memory_limit() using __memblock_remove() memblock: Make memblock functions handle overflowing range @size memblock: Reimplement __memblock_remove() using memblock_isolate_range() memblock: Separate out memblock_isolate_range() from memblock_set_node() memblock: Kill memblock_init() memblock: Kill sentinel entries at the end of static region arrays memblock: Add __memblock_dump_all() ...	2012-01-06 07:54:53 -08:00
Ingo Molnar	adaf4ed2ab	Merge commit 'v3.2-rc7' into x86/asm Merge reason: Update from -rc4 to -rc7. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2012-01-04 15:01:28 +01:00
Al Viro	2c9ede55ec	switch device_get_devnode() and ->devnode() to umode_t * both callers of device_get_devnode() are only interested in lower 16bits and nobody tries to return anything wider than 16bit anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:54:55 -05:00
Tony Luck	5f7b88d51e	x86/mce: Recognise machine check bank signature for data path error Action required data path signature is defined in table 15-19 of SDM: +-----------------------------------------------------------------------------+ \| SRAR Error \| Valid \| OVER \| UC \| EN \| MISCV \| ADDRV \| PCC \| S \| AR \| MCACOD \| \| Data Load \| 1 \| 0 \| 1 \| 1 \| 1 \| 1 \| 0 \| 1 \| 1 \| 0x134 \| +-----------------------------------------------------------------------------+ Recognise this, and pass MCE_AR_SEVERITY code back to do_machine_check() if we have the action handler configured (CONFIG_MEMORY_FAILURE=y) Acked-by: Borislav Petkov <bp@amd64.org> Signed-off-by: Tony Luck <tony.luck@intel.com>	2012-01-03 12:07:07 -08:00
Tony Luck	a8c321fbf9	x86/mce: Handle "action required" errors All non-urgent actions (reporting low severity errors and handling "action-optional" errors) are now handled by a work queue. This means that TIF_MCE_NOTIFY can be used to block execution for a thread experiencing an "action-required" fault until we get all cpus out of the machine check handler (and the thread that hit the fault into mce_notify_process(). We use the new mce_{save,find,clear}_info() API to get information from do_machine_check() to mce_notify_process(), and then use the newly improved memory_failure(..., MF_ACTION_REQUIRED) to handle the error (possibly signalling the process). Update some comments to make the new code flows clearer. Signed-off-by: Tony Luck <tony.luck@intel.com>	2012-01-03 12:07:01 -08:00
Tony Luck	af104e394e	x86/mce: Add mechanism to safely save information in MCE handler Machine checks on Intel cpus interrupt execution on all cpus, regardless of interrupt masking. We have a need to save some data about the cause of the machine check (physical address) in the machine check handler that can be retrieved later to attempt recovery in a more flexible execution state. Signed-off-by: Tony Luck <tony.luck@intel.com>	2012-01-03 12:06:53 -08:00
Tony Luck	85f92694af	x86/mce: Create helper function to save addr/misc when needed The MCI_STATUS_MISCV and MCI_STATUS_ADDRV bits in the bank status registers define whether the MISC and ADDR registers respectively contain valid data - provide a helper function to check these bits and read the registers when needed. In addition, processors that support software error recovery (as indicated by the MCG_SER_P bit in the MCG_CAP register) may include some undefined bits in the ADDR register - mask these out. Acked-by: Borislav Petkov <bp@amd64.org> Signed-off-by: Tony Luck <tony.luck@intel.com>	2012-01-03 12:06:45 -08:00
Tony Luck	cd42f4a3b2	HWPOISON: Clean up memory_failure() vs. __memory_failure() There is only one caller of memory_failure(), all other users call __memory_failure() and pass in the flags argument explicitly. The lone user of memory_failure() will soon need to pass flags too. Add flags argument to the callsite in mce.c. Delete the old memory_failure() function, and then rename __memory_failure() without the leading "__". Provide clearer message when action optional memory errors are ignored. Acked-by: Borislav Petkov <bp@amd64.org> Signed-off-by: Tony Luck <tony.luck@intel.com>	2012-01-03 12:06:32 -08:00
Avi Kivity	9e31905f29	Merge remote-tracking branch 'tip/perf/core' into kvm-updates/3.3 * tip/perf/core: (66 commits) perf, x86: Expose perf capability to other modules perf, x86: Implement arch event mask as quirk x86, perf: Disable non available architectural events jump_label: Provide jump_label_key initializers jump_label, x86: Fix section mismatch perf, core: Rate limit perf_sched_events jump_label patching perf: Fix enable_on_exec for sibling events perf: Remove superfluous arguments perf, x86: Prefer fixed-purpose counters when scheduling perf, x86: Fix event scheduler for constraints with overlapping counters perf, x86: Implement event scheduler helper functions perf: Avoid a useless pmu_disable() in the perf-tick x86/tools: Add decoded instruction dump mode x86: Update instruction decoder to support new AVX formats x86/tools: Fix insn_sanity message outputs x86/tools: Fix instruction decoder message output x86: Fix instruction decoder to handle grouped AVX instructions x86/tools: Fix Makefile to build all test tools perf test: Soft errors shouldn't stop the "Validate PERF_RECORD_" test perf test: Validate PERF_RECORD_ events and perf_sample fields ... Signed-off-by: Avi Kivity <avi@redhat.com> * commit 'b3d9468a8bd218a695e3a0ff112cd4efd27b670a': (66 commits) perf, x86: Expose perf capability to other modules perf, x86: Implement arch event mask as quirk x86, perf: Disable non available architectural events jump_label: Provide jump_label_key initializers jump_label, x86: Fix section mismatch perf, core: Rate limit perf_sched_events jump_label patching perf: Fix enable_on_exec for sibling events perf: Remove superfluous arguments perf, x86: Prefer fixed-purpose counters when scheduling perf, x86: Fix event scheduler for constraints with overlapping counters perf, x86: Implement event scheduler helper functions perf: Avoid a useless pmu_disable() in the perf-tick x86/tools: Add decoded instruction dump mode x86: Update instruction decoder to support new AVX formats x86/tools: Fix insn_sanity message outputs x86/tools: Fix instruction decoder message output x86: Fix instruction decoder to handle grouped AVX instructions x86/tools: Fix Makefile to build all test tools perf test: Soft errors shouldn't stop the "Validate PERF_RECORD_" test perf test: Validate PERF_RECORD_ events and perf_sample fields ...	2011-12-27 11:22:24 +02:00
Chris Wright	5202397df8	KVM guest: remove KVM guest pv mmu support This has not been used for some years now. It's time to remove it. Signed-off-by: Chris Wright <chrisw@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-12-27 11:17:08 +02:00
Suresh Siddha	c284b42aba	x86: Skip cpus with apic-ids >= 255 in !x2apic_mode If the x2apic mode is disabled for reasons like interrupt-remapping not available etc, then we need to skip the logical cpu bringup of apic-id's >= 255. Otherwise as the platform is in xapic mode, init/startup IPI's will consider only the low 8-bits and there is a possibility of re-sending init/startup IPI's to the logical cpu that is already online. This will avoid potential reboots/unpredictable behavior etc. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/20111222014632.702932458@sbsiddha-desk.sc.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-12-23 11:01:49 -08:00
Yinghai Lu	a31bc32760	x86, x2apic: Allow "nox2apic" to disable x2apic mode setup by BIOS Currently "nox2apic" boot parameter was not enabling x2apic mode if the cpu, kernel are all capable of enabling x2apic mode and the OS handover happened in xapic mode. However If the bios enabled x2apic prior to OS handover, using "nox2apic" boot parameter had no effect. If the boot cpu's apicid is < 255, enable "nox2apic" boot parameter to disable the x2apic mode setup by the bios. This will enable the kernel to fallback to xapic mode and bringup only the cpu's which has apic-id < 255. -v2: fix patch error and two compiling warning make disable_x2apic to be __init Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/CAE9FiQUeB-3uxJAMiHsz=uPWoFv5Hg1pVepz7aU6YtqOxMC-=Q@mail.gmail.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-12-23 11:01:43 -08:00
Yinghai Lu	fb209bd891	x86, x2apic: Fallback to xapic when BIOS doesn't setup interrupt-remapping On some of the recent Intel SNB platforms, by default bios is pre-enabling x2apic mode in the cpu with out setting up interrupt-remapping. This case was resulting in the kernel to panic as the cpu is already in x2apic mode but the OS was not able to enable interrupt-remapping (which is a pre-req for using x2apic capability). On these platforms all the apic-ids are < 255 and the kernel can fallback to xapic mode if the bios has not enabled interrupt-remapping (which is mostly the case if the bios has not exported interrupt-remapping tables to the OS). Reported-by: Berck E. Nash <flyboy@gmail.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20111222014632.600418637@sbsiddha-desk.sc.intel.com Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-12-23 11:01:01 -08:00
Yinghai Lu	a35fd28256	x86, acpi: Skip acpi x2apic entries if the x2apic feature is not present If the x2apic feature is not present (either the cpu is not capable of it or the user has disabled the feature using boot-parameter etc), ignore the x2apic MADT and SRAT entries provided by the ACPI tables. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20111222014632.540896503@sbsiddha-desk.sc.intel.com Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-12-23 11:00:50 -08:00
Yinghai Lu	e8524b2f43	x86, apic: Add probe() for apic_flat Currently we start with the default apic_flat mode and switch to some other apic model depending on the apic drivers acpi_madt_oem_check() routines and later followed by the apic drivers probe() routines. Once we selected non flat mode there was no case where we fall back to flat mode again. Upcoming changes allow bios-enabled x2apic mode to be disabled by the OS if interrupt-remapping etc is not setup properly by the bios. We now has a case for the apic to fall back to legacy flat mode during apic driver probe() seqeuence. Add a simple flat_probe() which allows the apic_flat mode to be the last fallback option. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20111222014632.484984298@sbsiddha-desk.sc.intel.com Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-12-23 11:00:45 -08:00
Robert Richter	2e64694de2	perf/x86: Fix raw_spin_unlock_irqrestore() usage Use raw_spin_unlock_irqrestore() as equivalent to raw_spin_lock_irqsave(). Signed-off-by: Robert Richter <robert.richter@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1324646665-13334-1-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-23 17:57:01 +01:00
Kay Sievers	edbaa603eb	driver-core: remove sysdev.h usage. The sysdev.h file should not be needed by any in-kernel code, so remove the .h file from these random files that seem to still want to include it. The sysdev code will be going away soon, so this include needs to be removed no matter what. Cc: Jiandong Zheng <jdzheng@broadcom.com> Cc: Scott Branden <sbranden@broadcom.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Kukjin Kim <kgene.kim@samsung.com> Cc: David Brown <davidb@codeaurora.org> Cc: Daniel Walker <dwalker@fifo99.com> Cc: Bryan Huntsman <bryanh@codeaurora.org> Cc: Ben Dooks <ben-linux@fluff.org> Cc: Wan ZongShun <mcuos.com@gmail.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: "Venkatesh Pallipadi Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>	2011-12-21 16:26:03 -08:00
Kay Sievers	8a25a2fd12	cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem and converts the devices to regular devices. The sysdev drivers are implemented as subsystem interfaces now. After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Userspace relies on events and generic sysfs subsystem infrastructure from sysdev devices, which are made available with this conversion. Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@amd64.org> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Len Brown <lenb@kernel.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-12-21 14:29:42 -08:00
Steven Rostedt	42181186ad	x86: Add counter when debug stack is used with interrupts enabled Mathieu Desnoyers pointed out a case that can cause issues with NMIs running on the debug stack: int3 -> interrupt -> NMI -> int3 Because the interrupt changes the stack, the NMI will not see that it preempted the debug stack. Looking deeper at this case, interrupts only happen when the int3 is from userspace or in an a location in the exception table (fixup). userspace -> int3 -> interurpt -> NMI -> int3 All other int3s that happen in the kernel should be processed without ever enabling interrupts, as the do_trap() call will panic the kernel if it is called to process any other location within the kernel. Adding a counter around the sections that enable interrupts while using the debug stack allows the NMI to also check that case. If the NMI sees that it either interrupted a task using the debug stack or the debug counter is non-zero, then it will have to change the IDT table to make the int3 not change stacks (which will corrupt the stack if it does). Note, I had to move the debug_usage functions out of processor.h and into debugreg.h because of the static inlined functions to inc and dec the debug_usage counter. __get_cpu_var() requires smp.h which includes processor.h, and would fail to build. Link: http://lkml.kernel.org/r/1323976535.23971.112.camel@gandalf.stny.rr.com Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul Turner <pjt@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-12-21 15:38:56 -05:00
Steven Rostedt	ccd49c2391	x86: Allow NMIs to hit breakpoints in i386 With i386, NMIs and breakpoints use the current stack and they do not reset the stack pointer to a fix point that might corrupt a previous NMI or breakpoint (as it does in x86_64). But NMIs are still not made to be re-entrant, and need to prevent the case that an NMI hitting a breakpoint (which does an iret), doesn't allow another NMI to run. The fix is to let the NMI be in 3 different states: 1) not running 2) executing 3) latched When no NMI is executing on a given CPU, the state is "not running". When the first NMI comes in, the state is switched to "executing". On exit of that NMI, a cmpxchg is performed to switch the state back to "not running" and if that fails, the NMI is restarted. If a breakpoint is hit and does an iret, which re-enables NMIs, and another NMI comes in before the first NMI finished, it will detect that the state is not in the "not running" state and the current NMI is nested. In this case, the state is switched to "latched" to let the interrupted NMI know to restart the NMI handler, and the nested NMI exits without doing anything. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul Turner <pjt@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-12-21 15:38:55 -05:00
Steven Rostedt	228bdaa95f	x86: Keep current stack in NMI breakpoints We want to allow NMI handlers to have breakpoints to be able to remove stop_machine from ftrace, kprobes and jump_labels. But if an NMI interrupts a current breakpoint, and then it triggers a breakpoint itself, it will switch to the breakpoint stack and corrupt the data on it for the breakpoint processing that it interrupted. Instead, have the NMI check if it interrupted breakpoint processing by checking if the stack that is currently used is a breakpoint stack. If it is, then load a special IDT that changes the IST for the debug exception to keep the same stack in kernel context. When the NMI is done, it puts it back. This way, if the NMI does trigger a breakpoint, it will keep using the same stack and not stomp on the breakpoint data for the breakpoint it interrupted. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-12-21 15:38:55 -05:00
Steven Rostedt	3f3c8b8c4b	x86: Add workaround to NMI iret woes In x86, when an NMI goes off, the CPU goes into an NMI context that prevents other NMIs to trigger on that CPU. If an NMI is suppose to trigger, it has to wait till the previous NMI leaves NMI context. At that time, the next NMI can trigger (note, only one more NMI will trigger, as only one can be latched at a time). The way x86 gets out of NMI context is by calling iret. The problem with this is that this causes problems if the NMI handle either triggers an exception, or a breakpoint. Both the exception and the breakpoint handlers will finish with an iret. If this happens while in NMI context, the CPU will leave NMI context and a new NMI may come in. As NMI handlers are not made to be re-entrant, this can cause havoc with the system, not to mention, the nested NMI will write all over the previous NMI's stack. Linus Torvalds proposed the following workaround to this problem: https://lkml.org/lkml/2010/7/14/264 "In fact, I wonder if we couldn't just do a software NMI disable instead? Hav ea per-cpu variable (in the _core_ percpu areas that get allocated statically) that points to the NMI stack frame, and just make the NMI code itself do something like NMI entry: - load percpu NMI stack frame pointer - if non-zero we know we're nested, and should ignore this NMI: - we're returning to kernel mode, so return immediately by using "popf/ret", which also keeps NMI's disabled in the hardware until the "real" NMI iret happens. - before the popf/iret, use the NMI stack pointer to make the NMI return stack be invalid and cause a fault - set the NMI stack pointer to the current stack pointer NMI exit (not the above "immediate exit because we nested"): clear the percpu NMI stack pointer Just do the iret. Now, the thing is, now the "iret" is atomic. If we had a nested NMI, we'll take a fault, and that re-does our "delayed" NMI - and NMI's will stay masked. And if we didn't have a nested NMI, that iret will now unmask NMI's, and everything is happy." I first tried to follow this advice but as I started implementing this code, a few gotchas showed up. One, is accessing per-cpu variables in the NMI handler. The problem is that per-cpu variables use the %gs register to get the variable for the given CPU. But as the NMI may happen in userspace, we must first perform a SWAPGS to get to it. The NMI handler already does this later in the code, but its too late as we have saved off all the registers and we don't want to do that for a disabled NMI. Peter Zijlstra suggested to keep all variables on the stack. This simplifies things greatly and it has the added benefit of cache locality. Two, faulting on the iret. I really wanted to make this work, but it was becoming very hacky, and I never got it to be stable. The iret already had a fault handler for userspace faulting with bad segment registers, and getting NMI to trigger a fault and detect it was very tricky. But for strange reasons, the system would usually take a double fault and crash. I never figured out why and decided to go with a simple "jmp" approach. The new approach I took also simplified things. Finally, the last problem with Linus's approach was to have the nested NMI handler do a ret instead of an iret to give the first NMI NMI-context again. The problem is that ret is much more limited than an iret. I couldn't figure out how to get the stack back where it belonged. I could have copied the current stack, pushed the return onto it, but my fear here is that there may be some place that writes data below the stack pointer. I know that is not something code should depend on, but I don't want to chance it. I may add this feature later, but for now, an NMI handler that loses NMI context will not get it back. Here's what is done: When an NMI comes in, the HW pushes the interrupt stack frame onto the per cpu NMI stack that is selected by the IST. A special location on the NMI stack holds a variable that is set when the first NMI handler runs. If this variable is set then we know that this is a nested NMI and we process the nested NMI code. There is still a race when this variable is cleared and an NMI comes in just before the first NMI does the return. For this case, if the variable is cleared, we also check if the interrupted stack is the NMI stack. If it is, then we process the nested NMI code. Why the two tests and not just test the interrupted stack? If the first NMI hits a breakpoint and loses NMI context, and then it hits another breakpoint and while processing that breakpoint we get a nested NMI. When processing a breakpoint, the stack changes to the breakpoint stack. If another NMI comes in here we can't rely on the interrupted stack to be the NMI stack. If the variable is not set and the interrupted task's stack is not the NMI stack, then we know this is the first NMI and we can process things normally. But in order to do so, we need to do a few things first. 1) Set the stack variable that tells us that we are in an NMI handler 2) Make two copies of the interrupt stack frame. One copy is used to return on iret The other is used to restore the first one if we have a nested NMI. This is what the stack will look like: +-------------------------+ \| original SS \| \| original Return RSP \| \| original RFLAGS \| \| original CS \| \| original RIP \| +-------------------------+ \| temp storage for rdx \| +-------------------------+ \| NMI executing variable \| +-------------------------+ \| Saved SS \| \| Saved Return RSP \| \| Saved RFLAGS \| \| Saved CS \| \| Saved RIP \| +-------------------------+ \| copied SS \| \| copied Return RSP \| \| copied RFLAGS \| \| copied CS \| \| copied RIP \| +-------------------------+ \| pt_regs \| +-------------------------+ The original stack frame contains what the HW put in when we entered the NMI. We store %rdx as a temp variable to use. Both the original HW stack frame and this %rdx storage will be clobbered by nested NMIs so we can not rely on them later in the first NMI handler. The next item is the special stack variable that is set when we execute the rest of the NMI handler. Then we have two copies of the interrupt stack. The second copy is modified by any nested NMIs to let the first NMI know that we triggered a second NMI (latched) and that we should repeat the NMI handler. If the first NMI hits an exception or breakpoint that takes it out of NMI context, if a second NMI comes in before the first one finishes, it will update the copied interrupt stack to point to a fix up location to trigger another NMI. When the first NMI calls iret, it will instead jump to the fix up location. This fix up location will copy the saved interrupt stack back to the copy and execute the nmi handler again. Note, the nested NMI knows enough to check if it preempted a previous NMI handler while it is in the fixup location. If it has, it will not modify the copied interrupt stack and will just leave as if nothing happened. As the NMI handle is about to execute again, there's no reason to latch now. To test all this, I forced the NMI handler to call iret and take itself out of NMI context. I also added assemble code to write to the serial to make sure that it hits the nested path as well as the fix up path. Everything seems to be working fine. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul Turner <pjt@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-12-21 15:38:54 -05:00
Steven Rostedt	1fd466efc8	x86: Document the NMI handler about not using paranoid_exit Linus cleaned up the NMI handler but it still needs some comments to explain why it uses save_paranoid but not paranoid_exit. Just to keep others from adding that in the future, document why it's not used. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-12-21 15:38:53 -05:00
Linus Torvalds	549c89b98c	x86: Do not schedule while still in NMI context The NMI handler uses the paranoid_exit routine that checks the NEED_RESCHED flag, and if it is set and the return is for userspace, then interrupts are enabled, the stack is swapped to the thread's stack, and schedule is called. The problem with this is that we are still in an NMI context until an iret is executed. This means that any new NMIs are now starved until an interrupt or exception occurs and does the iret. As NMIs can not be masked and can interrupt any location, they are treated as a special case. NEED_RESCHED should not be set in an NMI handler. The interruption by the NMI should not disturb the work flow for scheduling. Any IPI sent to a processor after sending the NEED_RESCHED would have to wait for the NMI anyway, and after the IPI finishes the schedule would be called as required. There is no reason to do anything special leaving an NMI. Remove the call to paranoid_exit and do a simple return. This not only fixes the bug of starved NMIs, but it also cleans up the code. Link: http://lkml.kernel.org/r/CA+55aFzgM55hXTs4griX5e9=v_O+=ue+7Rj0PTD=M7hFYpyULQ@mail.gmail.com Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul Turner <pjt@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-12-21 15:38:52 -05:00
Peter Zijlstra	e3f3541c19	perf: Extend the mmap control page with time (TSC) fields Extend the mmap control page with fields so that userspace can compute time deltas relative to the provided time fields. Currently only implemented for x86 with constant and nonstop TSC. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Arun Sharma <asharma@fb.com> Link: http://lkml.kernel.org/n/tip-3u1jucza77j3wuvs0x2bic0f@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-21 11:01:13 +01:00
Peter Zijlstra	0c9d42ed4c	perf, x86: Provide means for disabling userspace RDPMC Allow the disabling of RDPMC via a pmu specific attribute: echo 0 > /sys/bus/event_source/devices/cpu/rdpmc Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Arun Sharma <asharma@fb.com> Link: http://lkml.kernel.org/n/tip-pqeog465zo5hsimtkfz73f27@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-21 11:01:11 +01:00
Peter Zijlstra	fe4a330885	perf, x86: Implement user-space RDPMC support, to allow fast, user-space access to self-monitoring counters Implement a correct pmu::event_idx for the x86 counter index rules and set CR4.PCE on CPU_STARTING. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Arun Sharma <asharma@fb.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: http://lkml.kernel.org/n/tip-mwxab34dibqgzk5zywutfnha@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-21 11:01:10 +01:00
Stephane Eranian	9c1497ea59	perf events: Add Intel x86 mapping for PERF_COUNT_HW_REF_CPU_CYCLES Add event maps for Intel x86 processors (with architected PMU v2 or later). On AMD, there is frequency scaling but no Turbo. There is no core cycle event not subject to frequency scaling, therefore we do not provide a mapping. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1323559734-3488-4-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-21 10:26:39 +01:00
Stephane Eranian	cd09c0c40a	perf events: Enable raw event support for Intel unhalted_reference_cycles event This patch adds the encoding and definitions necessary for the unhalted_reference_cycles event avaialble since Intel Core 2 processors. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1323559734-3488-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-21 10:26:32 +01:00
Kevin Winchester	141168c36c	x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86' Several fields in struct cpuinfo_x86 were not defined for the !SMP case, likely to save space. However, those fields still have some meaning for UP, and keeping them allows some #ifdef removal from other files. The additional size of the UP kernel from this change is not significant enough to worry about keeping up the distinction: text data bss dec hex filename 4737168 506459 972040 6215667 5ed7f3 vmlinux.o.before 4737444 506459 972040 6215943 5ed907 vmlinux.o.after for a difference of 276 bytes for an example UP config. If someone wants those 276 bytes back badly then it should be implemented in a cleaner way. Signed-off-by: Kevin Winchester <kjwinchester@gmail.com> Cc: Steffen Persvold <sp@numascale.com> Link: http://lkml.kernel.org/r/1324428742-12498-1-git-send-email-kjwinchester@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-21 09:25:09 +01:00
Ingo Molnar	d87f69a16e	Merge commit 'v3.2-rc6' into perf/core Merge reason: Update with the latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-20 20:32:11 +01:00
Ingo Molnar	45aa0663cc	Merge branch 'memblock-kill-early_node_map' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/memblock	2011-12-20 12:14:26 +01:00
Clemens Ladisch	13f541c10b	x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT When printing the code bytes in show_registers(), the markers around the byte at the fault address could make the printk() format string look like a valid log level and facility code. This would prevent this byte from being printed and result in a spurious newline: [ 7555.765589] Code: 8b 32 e9 94 00 00 00 81 7d 00 ff 00 00 00 0f 87 96 00 00 00 48 8b 83 c0 00 00 00 44 89 e2 44 89 e6 48 89 df 48 8b 80 d8 02 00 00 [ 7555.765683] 8b 48 28 48 89 d0 81 e2 ff 0f 00 00 48 c1 e8 0c 48 c1 e0 04 Add KERN_CONT where needed, and elsewhere in show_registers() for consistency. Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Link: http://lkml.kernel.org/r/4EEFA7AE.9020407@ladisch.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-12-19 13:09:56 -08:00
Fernando Luis Vazquez Cao	b49d7d877f	x86: Convert per-cpu counter icr_read_retry_count into a member of irq_stat LAPIC related statistics are grouped inside the per-cpu structure irq_stat, so there is no need for icr_read_retry_count to be a standalone per-cpu variable. This patch moves icr_read_retry_count to where it belongs. Suggested-y: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Cc: Jörn Engel <joern@logfs.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-18 10:46:48 +01:00
Ingo Molnar	6e5ed27637	Merge commit 'v3.2-rc6' into x86/platform	2011-12-18 10:35:16 +01:00
Ingo Molnar	a228b5892b	Merge branch 'mce-inject' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/mce	2011-12-18 09:18:45 +01:00
Chen Gong	2c29d9dd57	x86: add IRQ context simulation in module mce-inject mce-inject provides a mechanism to simulate errors so that test scripts can check for correct operation of the kernel without requiring any specialized hardware to create rare events. The existing code can simulate events in normal process context and also in NMI context - but not in IRQ context. This patch fills that gap. Link: https://lkml.org/lkml/2011/12/7/537 Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2011-12-16 11:20:02 -08:00
Timo Teräs	cb3f718de8	x86, centaur: Enable cx8 for VIA Eden too My box with following cpuinfo needs the cx8 enabling still: vendor_id : CentaurHauls cpu family : 6 model : 13 model name : VIA Eden Processor 1200MHz stepping : 0 cpu MHz : 1199.940 cache size : 128 KB This fixes valgrind to work on my box (it requires and checks cx8 from cpuinfo). Signed-off-by: Timo Teräs <timo.teras@iki.fi> Link: http://lkml.kernel.org/r/1323961888-10223-1-git-send-email-timo.teras@iki.fi Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-12-15 08:04:42 -08:00
Joerg Roedel	969df4b829	x86: Report cpb and eff_freq_ro flags correctly Add the flags to get rid of the [9] and [10] feature names in cpuinfo's 'power management' fields and replace them with meaningful names. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/1323875574-17881-1-git-send-email-joerg.roedel@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-15 08:14:49 +01:00
Ingo Molnar	715a43182a	Merge branch 'early-mce-decode' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/mce	2011-12-15 08:13:40 +01:00
Ingo Molnar	304fb45374	Merge branch 'ucode-verify-size' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/microcode	2011-12-15 08:12:15 +01:00
Fenghua Yu	29e9bf1841	x86, mce, therm_throt: Don't report power limit and package level thermal throttle events in mcelog Thermal throttle and power limit events are not defined as MCE errors in x86 architecture and should not generate MCE errors in mcelog. Current kernel generates fake software defined MCE errors for these events. This may confuse users because they may think the machine has real MCE errors while actually only thermal throttle or power limit events happen. To make it worse, buggy firmware on some platforms may falsely generate the events. Therefore, kernel reports MCE errors which users think as real hardware errors. Although the firmware bugs should be fixed, on the other hand, kernel should not report MCE errors either. So mcelog is not a good mechanism to report these events. To report the events, we count them in respective counters (core_power_limit_count, package_power_limit_count, core_throttle_count, and package_throttle_count) in /sys/devices/system/cpu/cpu#/thermal_throttle/. Users can check the counters for each event on each CPU. Please note that all CPU's on one package report duplicate counters. It's user application's responsibity to retrieve a package level counter for one package. This patch doesn't report package level power limit, core level power limit, and package level thermal throttle events in mcelog. When the events happen, only report them in respective counters in sysfs. Since core level thermal throttle has been legacy code in kernel for a while and users accepted it as MCE error in mcelog, core level thermal throttle is still reported in mcelog. In the mean time, the event is counted in a counter in sysfs as well. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Borislav Petkov <bp@amd64.org> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20111215001945.GA21009@linux-os.sc.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-12-14 16:25:26 -08:00
Borislav Petkov	0937195715	x86, MCE: Drain mcelog buffer Add a function which drains whatever MCEs were logged in already during boot and before the decoder chains were registered. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-12-14 12:50:13 +01:00
Borislav Petkov	3653ada5d3	x86, mce: Add wrappers for registering on the decode chain No functionality change, this is done so that in a follow-on patch all queued-up MCEs can be decoded after registering on the chain. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-12-14 12:50:12 +01:00
Borislav Petkov	597e11a367	x86, microcode, AMD: Update copyrights Add Andreas and me as current maintainers. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-12-14 12:46:52 +01:00
Borislav Petkov	d733689ad5	x86, microcode, AMD: Exit early on success Once we've found and validated the ucode patch for the current CPU, there's no need to iterate over the remaining patches in the binary image. Exit then and save us a bunch of cycles. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-12-14 12:46:52 +01:00
Borislav Petkov	be62adb492	x86, microcode, AMD: Simplify ucode verification Basically, what we did until now is take out a chunk of the firmware image, vmalloc space for it and inspect it before application. And repeat. This patch changes all that so that we look at each ucode patch from the firmware image, check it for sanity and copy it to local buffer for application only once and if it passes all checks. Thus, vmalloc-ing for each piece is gone, we can do proper size checking only of the patch which is destined for the CPU of the current machine instead of each single patch, which is clearly wrong. Oh yeah, simplify and cleanup the code while at it, along with adding comments as to what actually happens. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-12-14 12:46:51 +01:00
Borislav Petkov	96b0ee4588	x86, microcode, AMD: Add a reusable buffer Add a simple 4K page which gets allocated on driver init and freed on driver exit instead of vmalloc'ing small buffers for each ucode patch. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-12-14 12:46:50 +01:00
Borislav Petkov	f72c1a5765	x86, microcode, AMD: Add a vendor-specific exit function This will be used to do cleanup work before the driver exits. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-12-14 12:46:47 +01:00
Fernando Luis Vázquez Cao	346b46be5f	x86: Add per-cpu stat counter for APIC ICR read tries In the IPI delivery slow path (NMI delivery) we retry the ICR read to check for delivery completion a limited number of times. [ The reason for the limited retries is that some of the places where it is used (cpu boot, kdump, etc) IPI delivery might not succeed (due to a firmware bug or system crash, for example) and in such a case it is better to give up and resume execution of other code. ] This patch adds a new entry to /proc/interrupts, RTR, which tells user space the number of times we retried the ICR read in the IPI delivery slow path. This should give some insight into how well the APIC message delivery hardware is working - if the counts are way too large then we are hitting a (very-) slow path way too often. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Cc: Jörn Engel <joern@logfs.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/n/tip-vzsp20lo2xdzh5f70g0eis2s@git.kernel.org [ extended the changelog ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-14 09:32:05 +01:00
Ingo Molnar	919b83452b	Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu	2011-12-14 08:16:43 +01:00
Matt Fleming	291f36325f	x86, efi: EFI boot stub support There is currently a large divide between kernel development and the development of EFI boot loaders. The idea behind this patch is to give the kernel developers full control over the EFI boot process. As H. Peter Anvin put it, "The 'kernel carries its own stub' approach been very successful in dealing with BIOS, and would make a lot of sense to me for EFI as well." This patch introduces an EFI boot stub that allows an x86 bzImage to be loaded and executed by EFI firmware. The bzImage appears to the firmware as an EFI application. Luckily there are enough free bits within the bzImage header so that it can masquerade as an EFI application, thereby coercing the EFI firmware into loading it and jumping to its entry point. The beauty of this masquerading approach is that both BIOS and EFI boot loaders can still load and run the same bzImage, thereby allowing a single kernel image to work in any boot environment. The EFI boot stub supports multiple initrds, but they must exist on the same partition as the bzImage. Command-line arguments for the kernel can be appended after the bzImage name when run from the EFI shell, e.g. Shell> bzImage console=ttyS0 root=/dev/sdb initrd=initrd.img v7: - Fix checkpatch warnings. v6: - Try to allocate initrd memory just below hdr->inird_addr_max. v5: - load_options_size is UTF-16, which needs dividing by 2 to convert to the corresponding ASCII size. v4: - Don't read more than image->load_options_size v3: - Fix following warnings when compiling CONFIG_EFI_STUB=n arch/x86/boot/tools/build.c: In function ‘main’: arch/x86/boot/tools/build.c:138:24: warning: unused variable ‘pe_header’ arch/x86/boot/tools/build.c:138:15: warning: unused variable ‘file_sz’ - As reported by Matthew Garrett, some Apple machines have GOPs that don't have hardware attached. We need to weed these out by searching for ones that handle the PCIIO protocol. - Don't allocate memory if no initrds are on cmdline - Don't trust image->load_options_size Maarten Lankhorst noted: - Don't strip first argument when booted from efibootmgr - Don't allocate too much memory for cmdline - Don't update cmdline_size, the kernel considers it read-only - Don't accept '\n' for initrd names v2: - File alignment was too large, was 8192 should be 512. Reported by Maarten Lankhorst on LKML. - Added UGA support for graphics - Use VIDEO_TYPE_EFI instead of hard-coded number. - Move linelength assignment until after we've assigned depth - Dynamically fill out AddressOfEntryPoint in tools/build.c - Don't use magic number for GDT/TSS stuff. Requested by Andi Kleen - The bzImage may need to be relocated as it may have been loaded at a high address address by the firmware. This was required to get my macbook booting because the firmware loaded it at 0x7cxxxxxx, which triggers this error in decompress_kernel(), if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff)) error("Destination address too large"); Cc: Mike Waychison <mikew@google.com> Cc: Matthew Garrett <mjg@redhat.com> Tested-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/1321383097.2657.9.camel@mfleming-mobl1.ger.corp.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-12-12 14:26:10 -08:00
Keith Packard	e1ad783b12	Revert "x86, efi: Calling __pa() with an ioremap()ed address is invalid" This hangs my MacBook Air at boot time; I get no console messages at all. I reverted this on top of -rc5 and my machine boots again. This reverts commit `e8c7106280`. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Keith Packard <keithp@keithp.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Huang Ying <huang.ying.caritas@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-12 18:25:56 +01:00
Frederic Weisbecker	1268fbc746	nohz: Remove tick_nohz_idle_enter_norcu() / tick_nohz_idle_exit_norcu() Those two APIs were provided to optimize the calls of tick_nohz_idle_enter() and rcu_idle_enter() into a single irq disabled section. This way no interrupt happening in-between would needlessly process any RCU job. Now we are talking about an optimization for which benefits have yet to be measured. Let's start simple and completely decouple idle rcu and dyntick idle logics to simplify. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-12-11 10:31:57 -08:00
Frederic Weisbecker	98ad1cc14a	x86: Call idle notifier after irq_enter() Interrupts notify the idle exit state before calling irq_enter(). But the notifier code calls rcu_read_lock() and this is not allowed while rcu is in an extended quiescent state. We need to wait for irq_enter() -> rcu_idle_exit() to be called before doing so otherwise this results in a grumpy RCU: [ 0.099991] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110() [ 0.099991] Hardware name: AMD690VM-FMH [ 0.099991] Modules linked in: [ 0.099991] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #255 [ 0.099991] Call Trace: [ 0.099991] <IRQ> [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0 [ 0.099991] [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20 [ 0.099991] [<ffffffff817d6fa2>] __atomic_notifier_call_chain+0xd2/0x110 [ 0.099991] [<ffffffff817d6ff1>] atomic_notifier_call_chain+0x11/0x20 [ 0.099991] [<ffffffff81001873>] exit_idle+0x43/0x50 [ 0.099991] [<ffffffff81020439>] smp_apic_timer_interrupt+0x39/0xa0 [ 0.099991] [<ffffffff817da253>] apic_timer_interrupt+0x13/0x20 [ 0.099991] <EOI> [<ffffffff8100ae67>] ? default_idle+0xa7/0x350 [ 0.099991] [<ffffffff8100ae65>] ? default_idle+0xa5/0x350 [ 0.099991] [<ffffffff8100b19b>] amd_e400_idle+0x8b/0x110 [ 0.099991] [<ffffffff810cb01f>] ? rcu_enter_nohz+0x8f/0x160 [ 0.099991] [<ffffffff810019a0>] cpu_idle+0xb0/0x110 [ 0.099991] [<ffffffff817a7505>] rest_init+0xe5/0x140 [ 0.099991] [<ffffffff817a7468>] ? rest_init+0x48/0x140 [ 0.099991] [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc [ 0.099991] [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135 [ 0.099991] [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Henroid <andrew.d.henroid@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2011-12-11 10:31:38 -08:00
Frederic Weisbecker	e37e112de3	x86: Enter rcu extended qs after idle notifier call The idle notifier, called by enter_idle(), enters into rcu read side critical section but at that time we already switched into the RCU-idle window (rcu_idle_enter() has been called). And it's illegal to use rcu_read_lock() in that state. This results in rcu reporting its bad mood: [ 1.275635] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110() [ 1.275635] Hardware name: AMD690VM-FMH [ 1.275635] Modules linked in: [ 1.275635] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #252 [ 1.275635] Call Trace: [ 1.275635] [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0 [ 1.275635] [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20 [ 1.275635] [<ffffffff817d6f22>] __atomic_notifier_call_chain+0xd2/0x110 [ 1.275635] [<ffffffff817d6f71>] atomic_notifier_call_chain+0x11/0x20 [ 1.275635] [<ffffffff810018a0>] enter_idle+0x20/0x30 [ 1.275635] [<ffffffff81001995>] cpu_idle+0xa5/0x110 [ 1.275635] [<ffffffff817a7465>] rest_init+0xe5/0x140 [ 1.275635] [<ffffffff817a73c8>] ? rest_init+0x48/0x140 [ 1.275635] [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc [ 1.275635] [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135 [ 1.275635] [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4 [ 1.275635] ---[ end trace a22d306b065d4a66 ]--- Fix this by entering rcu extended quiescent state later, just before the CPU goes to sleep. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2011-12-11 10:31:37 -08:00
Frederic Weisbecker	2bbb6817c0	nohz: Allow rcu extended quiescent state handling seperately from tick stop It is assumed that rcu won't be used once we switch to tickless mode and until we restart the tick. However this is not always true, as in x86-64 where we dereference the idle notifiers after the tick is stopped. To prepare for fixing this, add two new APIs: tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu(). If no use of RCU is made in the idle loop between tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch must instead call the new *_norcu() version such that the arch doesn't need to call rcu_idle_enter() and rcu_idle_exit(). Otherwise the arch must call tick_nohz_enter_idle() and tick_nohz_exit_idle() and also call explicitly: - rcu_idle_enter() after its last use of RCU before the CPU is put to sleep. - rcu_idle_exit() before the first use of RCU after the CPU is woken up. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: David Miller <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-12-11 10:31:36 -08:00
Frederic Weisbecker	280f06774a	nohz: Separate out irq exit and idle loop dyntick logic The tick_nohz_stop_sched_tick() function, which tries to delay the next timer tick as long as possible, can be called from two places: - From the idle loop to start the dytick idle mode - From interrupt exit if we have interrupted the dyntick idle mode, so that we reprogram the next tick event in case the irq changed some internal state that requires this action. There are only few minor differences between both that are handled by that function, driven by the ts->inidle cpu variable and the inidle parameter. The whole guarantees that we only update the dyntick mode on irq exit if we actually interrupted the dyntick idle mode, and that we enter in RCU extended quiescent state from idle loop entry only. Split this function into: - tick_nohz_idle_enter(), which sets ts->inidle to 1, enters dynticks idle mode unconditionally if it can, and enters into RCU extended quiescent state. - tick_nohz_irq_exit() which only updates the dynticks idle mode when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called). To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed into tick_nohz_idle_exit(). This simplifies the code and micro-optimize the irq exit path (no need for local_irq_save there). This also prepares for the split between dynticks and rcu extended quiescent state logics. We'll need this split to further fix illegal uses of RCU in extended quiescent states in the idle loop. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: David Miller <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2011-12-11 10:31:35 -08:00
Matt Fleming	f7d7d01be5	x86: Don't use magic strings for EFI loader signature Introduce a symbol, EFI_LOADER_SIGNATURE instead of using the magic strings, which also helps to reduce the amount of ifdeffery. Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/1318848017-12301-1-git-send-email-matt@console-pimps.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-12-09 17:35:38 -08:00
Matt Fleming	e8c7106280	x86, efi: Calling __pa() with an ioremap()ed address is invalid If we encounter an efi_memory_desc_t without EFI_MEMORY_WB set in ->attribute we currently call set_memory_uc(), which in turn calls __pa() on a potentially ioremap'd address. On CONFIG_X86_32 this is invalid, resulting in the following oops on some machines: BUG: unable to handle kernel paging request at f7f22280 IP: [<c10257b9>] reserve_ram_pages_type+0x89/0x210 [...] Call Trace: [<c104f8ca>] ? page_is_ram+0x1a/0x40 [<c1025aff>] reserve_memtype+0xdf/0x2f0 [<c1024dc9>] set_memory_uc+0x49/0xa0 [<c19334d0>] efi_enter_virtual_mode+0x1c2/0x3aa [<c19216d4>] start_kernel+0x291/0x2f2 [<c19211c7>] ? loglevel+0x1b/0x1b [<c19210bf>] i386_start_kernel+0xbf/0xc8 A better approach to this problem is to map the memory region with the correct attributes from the start, instead of modifying it after the fact. The uncached case can be handled by ioremap_nocache() and the cached by ioremap_cache(). Despite first impressions, it's not possible to use ioremap_cache() to map all cached memory regions on CONFIG_X86_64 because EFI_RUNTIME_SERVICES_DATA regions really don't like being mapped into the vmalloc space, as detailed in the following bug report, https://bugzilla.redhat.com/show_bug.cgi?id=748516 Therefore, we need to ensure that any EFI_RUNTIME_SERVICES_DATA regions are covered by the direct kernel mapping table on CONFIG_X86_64. To accomplish this we now map E820_RESERVED_EFI regions via the direct kernel mapping with the initial call to init_memory_mapping() in setup_arch(), whereas previously these regions wouldn't be mapped if they were after the last E820_RAM region until efi_ioremap() was called. Doing it this way allows us to delete efi_ioremap() completely. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Huang Ying <huang.ying.caritas@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console-pimps.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-09 08:32:26 +01:00
Borislav Petkov	d059f24a96	x86, CPU: Drop superfluous get_cpu_cap() prototype The get_cpu_cap() external function prototype was declared twice so lose one of them. Clean up the header guard while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1322594083-14507-1-git-send-email-bp@amd64.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-09 08:00:53 +01:00
Mark Langsdorf	2ded6e6a94	x86, hpet: Immediately disable HPET timer 1 if rtc irq is masked When HPET is operating in RTC mode, the TN_ENABLE bit on timer1 controls whether the HPET or the RTC delivers interrupts to irq8. When the system goes into suspend, the RTC driver sends a signal to the HPET driver so that the HPET releases control of irq8, allowing the RTC to wake the system from suspend. The switchover is accomplished by a write to the HPET configuration registers which currently only occurs while servicing the HPET interrupt. On some systems, I have seen the system suspend before an HPET interrupt occurs, preventing the write to the HPET configuration register and leaving the HPET in control of the irq8. As the HPET is not active during suspend, it does not generate a wake signal and RTC alarms do not work. This patch forces the HPET driver to immediately transfer control of the irq8 channel to the RTC instead of waiting until the next interrupt event. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Link: http://lkml.kernel.org/r/20111118153306.GB16319@alberich.amd.com Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org	2011-12-08 21:47:22 +01:00
Tejun Heo	1aadc0560f	memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users The only function of memblock_analyze() is now allowing resize of memblock region arrays. Rename it to memblock_allow_resize() and update its users. * The following users remain the same other than renaming. arm/mm/init.c::arm_memblock_init() microblaze/kernel/prom.c::early_init_devtree() powerpc/kernel/prom.c::early_init_devtree() openrisc/kernel/prom.c::early_init_devtree() sh/mm/init.c::paging_init() sparc/mm/init_64.c::paging_init() unicore32/mm/init.c::uc32_memblock_init() * In the following users, analyze was used to update total size which is no longer necessary. powerpc/kernel/machine_kexec.c::reserve_crashkernel() powerpc/kernel/prom.c::early_init_devtree() powerpc/mm/init_32.c::MMU_init() powerpc/mm/tlb_nohash.c::__early_init_mmu() powerpc/platforms/ps3/mm.c::ps3_mm_add_memory() powerpc/platforms/embedded6xx/wii.c::wii_memory_fixups() sh/kernel/machine_kexec.c::reserve_crashkernel() * x86/kernel/e820.c::memblock_x86_fill() was directly setting memblock_can_resize before populating memblock and calling analyze afterwards. Call memblock_allow_resize() before start populating. memblock_can_resize is now static inside memblock.c. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-12-08 10:22:08 -08:00
Tejun Heo	fe091c208a	memblock: Kill memblock_init() memblock_init() initializes arrays for regions and memblock itself; however, all these can be done with struct initializers and memblock_init() can be removed. This patch kills memblock_init() and initializes memblock with struct initializer. The only difference is that the first dummy entries don't have .nid set to MAX_NUMNODES initially. This doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-12-08 10:22:07 -08:00
Dan Carpenter	3d6240b53e	x86, NMI: Add to_cpumask() to silence compile warning Gcc complains if we don't cast this to a struct cpumask pointer. arch/x86/kernel/nmi_selftest.c:93:2: warning: passing argument 1 of ‘cpumask_empty’ from incompatible pointer type [enabled by default] Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/r/20111207110612.GA3437@mwanda Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-07 23:26:57 +01:00
Mitsuo Hayasaka	d2db661021	x86: Add stack top margin for stack overflow checking It seems that a margin for stack overflow checking is added to top of a kernel stack but is not added to IRQ and exception stacks in stack_overflow_check(). Therefore, the overflows of IRQ and exception stacks are always detected only after they actually occurred and data corruption might occur due to them. This patch adds the margin to top of IRQ and exception stacks as well as a kernel stack to enhance reliability. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20111207082910.9847.3359.stgit@ltc219.sdl.hitachi.co.jp [ removed the #undef - we typically don't do that for uncommon names ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-07 09:27:11 +01:00
Don Zickus	565cbc3e93	x86, NMI: NMI-selftest should handle the UP case properly If no remote cpus are online, then just quietly skip the remote IPI test for now. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: andi@firstfloor.org Cc: torvalds@linux-foundation.org Cc: peterz@infradead.org Cc: robert.richter@amd.com Link: http://lkml.kernel.org/r/20111206180859.GR1669@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 21:02:39 +01:00
Gleb Natapov	b3d9468a8b	perf, x86: Expose perf capability to other modules KVM needs to know perf capability to decide which PMU it can expose to a guest. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1320929850-10480-8-git-send-email-gleb@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 20:41:08 +01:00
Peter Zijlstra	c1d6f42f1a	perf, x86: Implement arch event mask as quirk Implement the disabling of arch events as a quirk so that we can print a message along with it. This creates some visibility into the problem space and could allow us to work on adding more work-around like the AAJ80 one. Requested-by: Ingo Molnar <mingo@elte.hu> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-wcja2z48wklzu1b0nkz0a5y7@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 20:41:06 +01:00
Gleb Natapov	ffb871bc91	x86, perf: Disable non available architectural events Intel CPUs report non-available architectural events in cpuid leaf 0AH.EBX. Use it to disable events that are not available according to CPU. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1320929850-10480-7-git-send-email-gleb@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 20:41:05 +01:00
Peter Zijlstra	9cdbe1cbac	jump_label, x86: Fix section mismatch WARNING: arch/x86/kernel/built-in.o(.text+0x4c71): Section mismatch in reference from the function arch_jump_label_transform_static() to the function .init.text:text_poke_early() The function arch_jump_label_transform_static() references the function __init text_poke_early(). This is often because arch_jump_label_transform_static lacks a __init annotation or the annotation of text_poke_early is wrong. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jason Baron <jbaron@redhat.com> Link: http://lkml.kernel.org/n/tip-9lefe89mrvurrwpqw5h8xm8z@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 20:41:02 +01:00
Seiichi Ikarashi	1cf8343f55	x86: Fix rflags in FAKE_STACK_FRAME The x86_64 kernel pushes the fake kernel stack in arch/x86/kernel/entry_64.S:FAKE_STACK_FRAME, and rflags register in it does not conform to the specification. Although Intel's manual[1] says bit 1 of it shall be set to 1, this bit is cleared to 0 on pushing the fake stack. [1] Intel(R) 64 and IA-32 Architectures Software Developer's Manual Vol.1 3-21 Figure 3-8. EFLAGS Register If it is not on purpose, it is better to be fixed, because it can lead some tools misunderstanding the stack frame. For example, "crash" utility[2] actually detects it and warns you like below: RIP: ffffffff8005dfa2 RSP: ffff8104ce0c7f58 RFLAGS: 00000200 [...] bt: WARNING: possibly bogus exception frame Signed-off-by: Seiichi Ikarashi <s.ikarashi@jp.fujitsu.com> Tested-by: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 10:02:38 +01:00
Peter Zijlstra	4defea8559	perf, x86: Prefer fixed-purpose counters when scheduling This avoids a scheduling failure for cases like: cycles, cycles, instructions, instructions (on Core2) Which would end up being programmed like: PMC0, PMC1, FP-instructions, fail Because all events will have the same weight. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-8tnwb92asqj7xajqqoty4gel@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 08:33:58 +01:00
Robert Richter	bc1738f6ee	perf, x86: Fix event scheduler for constraints with overlapping counters The current x86 event scheduler fails to resolve scheduling problems of certain combinations of events and constraints. This happens if the counter mask of such an event is not a subset of any other counter mask of a constraint with an equal or higher weight, e.g. constraints of the AMD family 15h pmu: counter mask weight amd_f15_PMC30 0x09 2 <--- overlapping counters amd_f15_PMC20 0x07 3 amd_f15_PMC53 0x38 3 The scheduler does not find then an existing solution. Here is an example: event code counter failure possible solution 0x02E PMC[3,0] 0 3 0x043 PMC[2:0] 1 0 0x045 PMC[2:0] 2 1 0x046 PMC[2:0] FAIL 2 The event scheduler may not select the correct counter in the first cycle because it needs to know which subsequent events will be scheduled. It may fail to schedule the events then. To solve this, we now save the scheduler state of events with overlapping counter counstraints. If we fail to schedule the events we rollback to those states and try to use another free counter. Constraints with overlapping counters are marked with a new introduced overlap flag. We set the overlap flag for such constraints to give the scheduler a hint which events to select for counter rescheduling. The EVENT_CONSTRAINT_OVERLAP() macro can be used for this. Care must be taken as the rescheduling algorithm is O(n!) which will increase scheduling cycles for an over-commited system dramatically. The number of such EVENT_CONSTRAINT_OVERLAP() macros and its counter masks must be kept at a minimum. Thus, the current stack is limited to 2 states to limit the number of loops the algorithm takes in the worst case. On systems with no overlapping-counter constraints, this implementation does not increase the loop count compared to the previous algorithm. V2: * Renamed redo -> overlap. * Reimplementation using perf scheduling helper functions. V3: * Added WARN_ON_ONCE() if out of save states. * Changed function interface of perf_sched_restore_state() to use bool as return value. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1321616122-1533-3-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 08:33:56 +01:00
Robert Richter	1e2ad28f80	perf, x86: Implement event scheduler helper functions This patch introduces x86 perf scheduler code helper functions. We need this to later add more complex functionality to support overlapping counter constraints (next patch). The algorithm is modified so that the range of weight values is now generated from the constraints. There shouldn't be other functional changes. With the helper functions the scheduler is controlled. There are functions to initialize, traverse the event list, find unused counters etc. The scheduler keeps its own state. V3: * Added macro for_each_set_bit_cont(). * Changed functions interfaces of perf_sched_find_counter() and perf_sched_next_event() to use bool as return value. * Added some comments to make code better understandable. V4: * Fix broken event assignment if weight of the first event is not wmin (perf_sched_init()). Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1321616122-1533-2-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 08:33:54 +01:00
Srikar Dronamraju	cc3a1bf52a	x86: Clean up and extend do_int3() Since there is a possibility of !KPROBES int3 listeners (such as kgdb) and since DIE_TRAP is currently not being used by anybody, notify all listeners with DIE_INT3. Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20111025142159.GB21225@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 08:20:37 +01:00
Srikar Dronamraju	3596ff4e6b	x86: Call do_notify_resume() with interrupts enabled do_notify_resume() gets called with interrupts disabled on x86_32. This is different from the x86_64 behavior, where interrupts are enabled at the time. Queries on lkml on this issue hasn't yielded any clear answer. Lets make x86_32 behave the same as x86_64, unless there is a real reason to maintain status quo. Please refer https://lkml.org/lkml/2011/9/27/130 for more details. A similar change was suggested in ARM: https://lkml.org/lkml/2011/8/25/231 My 32-bit machine works fine (tm) with this patch. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20111025141812.GA21225@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 08:20:34 +01:00
Steffen Persvold	e4a02b4a95	x86: Fix the !CONFIG_NUMA build of the new CPU ID fixup code support I used "ifdef CONFIG_NUMA" simply because it doesn't make sense in a non-numa configuration even with SMP enabled. Besides, the only place where it is called right now is in kernel/cpu/amd.c:srat_detect_node() within the "CONFIG_NUMA" protected part. Signed-off-by: Steffen Persvold <sp@numascale.com> Cc: Daniel J Blueman <daniel@numascale-asia.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Link: http://lkml.kernel.org/r/1323073238-32686-2-git-send-email-daniel@numascale-asia.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 07:03:37 +01:00
Linus Torvalds	45e713efe2	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: intr_remapping: Fix section mismatch in ir_dev_scope_init() intel-iommu: Fix section mismatch in dmar_parse_rmrr_atsr_dev() x86, amd: Fix up numa_node information for AMD CPU family 15h model 0-0fh northbridge functions x86, AMD: Correct align_va_addr documentation x86/rtc, mrst: Don't register a platform RTC device for for Intel MID platforms x86/mrst: Battery fixes x86/paravirt: PTE updates in k(un)map_atomic need to be synchronous, regardless of lazy_mmu mode x86: Fix "Acer Aspire 1" reboot hang x86/mtrr: Resolve inconsistency with Intel processor manual x86: Document rdmsr_safe restrictions x86, microcode: Fix the failure path of microcode update driver init code Add TAINT_FIRMWARE_WORKAROUND on MTRR fixup x86/mpparse: Account for bus types other than ISA and PCI x86, mrst: Change the pmic_gpio device type to IPC mrst: Added some platform data for the SFI translations x86,mrst: Power control commands update x86/reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot x86, UV: Fix UV2 hub part number x86: Add user_mode_vm check in stack_overflow_check	2011-12-05 16:54:15 -08:00
Linus Torvalds	232ea34455	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix loss of notification with multi-event perf, x86: Force IBS LVT offset assignment for family 10h perf, x86: Disable PEBS on SandyBridge chips trace_events_filter: Use rcu_assign_pointer() when setting ftrace_event_call->filter perf session: Fix crash with invalid CPU list perf python: Fix undefined symbol problem perf/x86: Enable raw event access to Intel offcore events perf: Don't use -ENOSPC for out of PMU resources perf: Do not set task_ctx pointer in cpuctx if there are no events in the context perf/x86: Fix PEBS instruction unwind oprofile, x86: Fix crash when unloading module (nmi timer mode) oprofile: Fix crash when unloading module (hr timer mode)	2011-12-05 16:54:00 -08:00
Thomas Gleixner	0518469d0a	Merge branch 'fortglx/3.3/tip/timers/core' of git://git.linaro.org/people/jstultz/linux into timers/core	2011-12-05 22:13:49 +01:00
Andreas Herrmann	f62ef5f3e9	x86, amd: Fix up numa_node information for AMD CPU family 15h model 0-0fh northbridge functions I've received complaints that the numa_node attribute for family 15h model 00-0fh (e.g. Interlagos) northbridge functions shows -1 instead of the proper node ID. Correct this with attached quirks (similar to quirks for other AMD CPU families used in multi-socket systems). Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Frank Arnold <frank.arnold@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/20111202072143.GA31916@alberich.amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 18:13:11 +01:00
Suresh Siddha	28a00184be	x86, tsc: Skip TSC synchronization checks for tsc=reliable tsc=reliable boot parameter is supposed to skip all the TSC stablility checks during boot time. On a 8-socket system where we want to run an experiment with the "tsc=reliable" boot option, TSC synchronization checks are not getting skipped and marking the TSC as not stable. Check for tsc_clocksource_reliable (which is set via tsc=reliable or for platforms supporting synthetic TSC_RELIABLE feature bit etc) and when set, skip the TSC synchronization tests during boot. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: John Stultz <johnstul@us.ibm.com> Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1320446537.15071.14.camel@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 18:00:31 +01:00
Jan Beulich	f6b2bc8476	x86-64: Cleanup some assembly entry points system_call_after_swapgs doesn't really benefit from forcing alignment from it - quite the opposite, native code needlessly so far got a big NOP instruction inserted in front of it. Xen being the only user of the separate entry point can well live with the branch going to three bytes into a cache line. The compatibility mode ptregs entry points for one can make use of the GLOBAL() macro, and should be suitably aligned. Their shared continuation point (ia32_ptregs_common) otoh doesn't need to be global at all, but should continue to be properly aligned. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/4ED4CEEA020000780006407D@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 17:24:43 +01:00
Jan Beulich	46db09d3fd	x86-64: Slightly shorten line system call entry and exit paths GET_THREAD_INFO() involves a memory read immediately followed by an "sub" on the value read, in turn (in several cases) immediately followed by a use of the calculated value as the base address of a memory access. This combination of instructions has a non-negligible potential for stalls. In the system call entry point code, however, the (fixed) offset of the stack pointer from the end of the stack is generally known, and hence we can instead avoid the memory load and subtract, and instead do the memory reference using %rsp as the base register. To do so in a legible fashion, introduce a THREAD_INFO() macro which, provided a register (generally %rsp) and the known offset from the end of the stack, produces a suitable memory access operand. The patch attempts to only touch the fast paths (no auditing and alike), but manages to do so only in the 64-bit entry point case; the compatibility mode entry points have so many interdependencies between their various branch targets that it was necessary to also adjust the slow paths to eliminate the risk of having missed some register dependency during code analysis. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/4ED4CD690200007800064075@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 17:24:41 +01:00
Jan Beulich	39e9543344	x86-64: Reduce amount of redundant code generated for invalidate_interruptNN Previously these up to 32 entry points, consisting of all the same code except for their very first instruction, consumed 0x70 bytes per instance. Just like for device interrupt entry points, fold them together so that they all use a single instance of the code after having pushed their vector indicator (resulting in 0x10 bytes per instance, to retain 16-byte alignment of the individual entry points). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/4ED4CA230200007800064065@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 17:24:39 +01:00
Jan Beulich	70ea6855d3	x86-64: Slightly shorten int_ret_from_sys_call Testing for a return to ring 0 was necessary here solely because of the branch out of ret_from_fork. That branch, however, can be directed to retint_restore_args, and thus the test-and-branch can be eliminated here. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/4ED4C7EE0200007800064028@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 17:24:37 +01:00
Steffen Persvold	44b111b519	x86: Add NumaChip support Adds support for Numascale NumaChip large-SMP systems. It is needed to enable the booting of more than ~168 cores. v2: - [Steffen] enumerate only accessible northbridges - [Daniel] rediffed and validated against 3.1-rc10 v3: - [Daniel] use x86_init core numbering override - [Daniel] cleanups as per feedback v4: - [Daniel] use updated x86_cpuinit override v5: - drop disabling interrupts locally, as ISR write is atomic; drop delay - added read-mostly annotations where appropriate - require CONFIG_SMP, so drop conditional path Workload tested on 96 cores/16 sockets. Signed-off-by: Steffen Persvold <sp@numascale.com> Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Link: http://lkml.kernel.org/r/1323101246-2400-1-git-send-email-daniel@numascale-asia.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 17:17:24 +01:00
Daniel J Blueman	64be4c1c24	x86: Add x86_init platform override to fix up NUMA core numbering Add an x86_init vector for handling inconsistent core numbering. This is useful for multi-fabric platforms, such as Numascale NumaConnect. v2: - use struct x86_cpuinit_ops - provide default fall-back function to warn Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Cc: Steffen Persvold <sp@numascale.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Link: http://lkml.kernel.org/r/1323073238-32686-2-git-send-email-daniel@numascale-asia.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 17:17:21 +01:00
Daniel J Blueman	9a0ebfbe3f	x86: Make flat_init_apic_ldr() available Allow flat_init_apic_ldr() to be used outside the compilation unit for similar APIC implementations. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Cc: Steffen Persvold <sp@numascale.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Link: http://lkml.kernel.org/r/1323073238-32686-1-git-send-email-daniel@numascale-asia.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 17:17:07 +01:00
Jack Steiner	b565201cf7	x86: Reduce clock calibration time during slave cpu startup Reduce the startup time for slave cpus. Adds hooks for an arch-specific function for clock calibration. These hooks are used on x86. If a newly started cpu has the same phys_proc_id as a core already active, uses the TSC for the delay loop and has a CONSTANT_TSC, use the already-calculated value of loops_per_jiffy. This patch reduces the time required to start slave cpus on a 4096 cpu system from: 465 sec OLD 62 sec NEW This reduces boot time on a 4096p system by almost 7 minutes. Nice... Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: John Stultz <john.stultz@linaro.org> [fix CONFIG_SMP=n build] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 17:12:43 +01:00
H Hartley Sweeten	706d9a9c8b	arch/x86/kernel/e820.c: quiet sparse noise about plain integer as NULL pointer The last parameter to sort() is a pointer to the function used to swap items. This parameter should be NULL, not 0, when not used. This quiets the following sparse warning: warning: Using plain integer as NULL pointer Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: hartleys@visionengravers.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 17:10:20 +01:00
Mathias Nyman	35d4769962	x86/rtc, mrst: Don't register a platform RTC device for for Intel MID platforms Intel MID x86 platforms have a memory mapped virtual RTC instead. No MID platform have the default ports (and accessing them may do weird stuff). Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: feng.tang@intel.com Cc: Feng Tang <feng.tang@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 17:09:21 +01:00
Mike Ditto	d1bbdd6692	arch/x86/kernel/e820.c: Eliminate bubble sort from sanitize_e820_map() Replace the bubble sort in sanitize_e820_map() with a call to the generic kernel sort function to avoid pathological performance with large maps. On large (thousands of entries) E820 maps, the previous code took minutes to run; with this change it's now milliseconds. Signed-off-by: Mike Ditto <mditto@google.com> Cc: sassmann@kpanic.de Cc: yuenn@google.com Cc: Stefan Assmann <sassmann@kpanic.de> Cc: Nancy Yuen <yuenn@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 17:08:14 +01:00
H Hartley Sweeten	98b8b99ae1	arch/x86/kernel/ptrace.c: Quiet sparse noise ptrace_set_debugreg() is only used in this file and should be static. This also quiets the following sparse warning: warning: symbol 'ptrace_set_debugreg' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: hartleys@visionengravers.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 16:48:23 +01:00
Ingo Molnar	f1b23714cb	Merge branch 'ucode' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent	2011-12-05 16:38:51 +01:00
Peter Chubb	1ef0389096	x86: Fix "Acer Aspire 1" reboot hang Looks like on some Acer Aspire 1s with older bioses, reboot via bios fails. It works on my machine, (with BIOS version 0.3310) but not on some others (BIOS version 0.3309). There's a log of problems at: https://bbs.archlinux.org/viewtopic.php?id=124136 This patch adds a different callback to the reboot quirk table, to allow rebooting via keybaord controller. Reported-by: Uroš Vampl <mobile.leecher@gmail.com> Tested-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Peter Chubb <peter.chubb@nicta.com.au> Cc: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1323093233-9481-1-git-send-email-anarsoul@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 15:06:17 +01:00
Ajaykumar Hotchandani	8dbf4a3003	x86/mtrr: Resolve inconsistency with Intel processor manual Following is from Notes of section 11.5.3 of Intel processor manual available at: http://www.intel.com/Assets/PDF/manual/325384.pdf For the Pentium 4 and Intel Xeon processors, after the sequence of steps given above has been executed, the cache lines containing the code between the end of the WBINVD instruction and before the MTRRS have actually been disabled may be retained in the cache hierarchy. Here, to remove code from the cache completely, a second WBINVD instruction must be executed after the MTRRs have been disabled. This patch provides resolution for that. Ideally, I will like to make changes only for Pentium 4 and Xeon processors. But, I am not finding easier way to do it. And, extra wbinvd() instruction does not hurt much for other processors. Signed-off-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi> Link: http://lkml.kernel.org/r/4EBD1CC5.3030008@oracle.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 15:06:15 +01:00
Srivatsa S. Bhat	bd39906397	x86, microcode: Fix the failure path of microcode update driver init code The microcode update driver's initialization code does not handle failures correctly. This patch fixes this issue. Signed-off-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20111107123530.12164.31227.stgit@srivatsabhat.in.ibm.com Link: http://lkml.kernel.org/r/4ED8E2270200007800065120@nat28.tlf.novell.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-12-05 14:21:01 +01:00
Alan Cox	1ea7c6737c	x86/config: Revamp configuration for MID devices This follows on from the patch applied in 3.2rc1 which creates an INTEL_MID configuration. We can now add the entry for Medfield specific code. After this is merged the final patch will be submitted which moves the rest of the device Kconfig dependancies to MRST/MEDFIELD/INTEL_MID as appropriate. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 14:02:46 +01:00
Thomas Meyer	cced402299	x86: Use kmemdup() in copy_thread(), rather than duplicating its implementation The semantic patch that makes this change is available in scripts/coccinelle/api/memdup.cocci. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Link: http://lkml.kernel.org/r/1321569820.1624.275.camel@localhost.localdomain Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 13:54:39 +01:00
Ferenc Wagner	3f7787b36c	x86: Replace the EVT_TO_HPET_DEV() macro with an inline function The original macro worked only when applied to variables named 'evt'. While this could have been fixed by simply renaming the macro argument, a more type-safe replacement is preferred. Signed-off-by: Ferenc Wagner <wferi@niif.hu> Cc: Venkatesh Pallipadi $Venki$ <venki@google.com> Link: http://lkml.kernel.org/r/8ed5c66c02041226e8cf8b4d5d6b41e543d90bd6.1321626272.git.wferi@niif.hu Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 13:53:36 +01:00
Prarit Bhargava	644ddf588f	Add TAINT_FIRMWARE_WORKAROUND on MTRR fixup TAINT_FIRMWARE_WORKAROUND should be set when an MTRR fixup is done. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/1318958650-12447-1-git-send-email-prarit@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 13:48:50 +01:00
Bjorn Helgaas	9e6866686b	x86/mpparse: Account for bus types other than ISA and PCI In commit `f8924e770e` ("x86: unify mp_bus_info"), the 32-bit and 64-bit versions of MP_bus_info were rearranged to match each other better. Unfortunately it introduced a regression: prior to that change we used to always set the mp_bus_not_pci bit, then clear it if we found a PCI bus. After it, we set mp_bus_not_pci for ISA buses, clear it for PCI buses, and leave it alone otherwise. In the cases of ISA and PCI, there's not much difference. But ISA is not the only non-PCI bus, so it's better to always set mp_bus_not_pci and clear it only for PCI. Without this change, Dan's Dell PowerEdge 4200 panics on boot with a log indicating interrupt routing trouble unless the "noapic" option is supplied. With this change, the machine boots reliably without "noapic". Fixes http://bugs.debian.org/586494 Reported-bisected-and-tested-by: Dan McGrath <troubledaemon@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # 2.6.26+ Cc: Dan McGrath <troubledaemon@gmail.com> Cc: Alexey Starikovskiy <aystarik@gmail.com> [jrnieder@gmail.com: clarified commit message] Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Link: http://lkml.kernel.org/r/20111122215000.GA9151@elie.hsd1.il.comcast.net Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 13:46:27 +01:00
Ingo Molnar	53b5650273	x86: Fix the 32-bit stackoverflow-debug build The panic_on_stackoverflow variable needs to be avilable on the 32-bit side as well ... Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: http://lkml.kernel.org/r/20111129060836.11076.12323.stgit@ltc219.sdl.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 12:25:44 +01:00
Rafael J. Wysocki	6be30bb7d7	x86/reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot Dell OptiPlex 990 is known to require PCI reboot, so add it to the reboot blacklist in pci_reboot_dmi_table[]. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Link: http://lkml.kernel.org/r/201111160019.51303.rjw@sisk.pl Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 12:20:43 +01:00
Andy Lutomirski	2e57ae0515	x86: Default to vsyscall=emulate This essentially reverts: `2b666859ec`: x86: Default to vsyscall=native for now The ABI breakage should now be fixed by: commit 48c4206f5b02f28c4c78a1f5b491d3772fb64fb9 Author: Andy Lutomirski <luto@mit.edu> Date: Thu Oct 20 08:48:19 2011 -0700 x86-64: Set siginfo and context on vsyscall emulation faults Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: Adrian Bunk <bunk@stusta.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/93154af3b2b6d208906ae02d80d92cf60c6fa94f.1320712291.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 12:17:29 +01:00
Andy Lutomirski	4fc3490114	x86-64: Set siginfo and context on vsyscall emulation faults To make this work, we teach the page fault handler how to send signals on failed uaccess. This only works for user addresses (kernel addresses will never hit the page fault handler in the first place), so we need to generate signals for those separately. This gets the tricky case right: if the user buffer spans multiple pages and only the second page is invalid, we set cr2 and si_addr correctly. UML relies on this behavior to "fault in" pages as needed. We steal a bit from thread_info.uaccess_err to enable this. Before this change, uaccess_err was a 32-bit boolean value. This fixes issues with UML when vsyscall=emulate. Reported-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/4c8f91de7ec5cd2ef0f59521a04e1015f11e42b4.1320712291.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 12:17:27 +01:00
Don Zickus	bda6263398	x86, NMI: Add knob to disable using NMI IPIs to stop cpus Some machines may exhibit problems using the NMI to stop other cpus. This knob just allows one to revert back to the original behaviour to help diagnose the problem. V2: make function static Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> Cc: seiji.aguchi@hds.com Cc: vgoyal@redhat.com Cc: mjg@redhat.com Cc: tony.luck@intel.com Cc: gong.chen@intel.com Cc: satoru.moriya@hds.com Cc: avi@redhat.com Cc: Andi Kleen <andi@firstfloor.org> Link: http://lkml.kernel.org/r/1318533267-18880-4-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 12:00:23 +01:00
Don Zickus	99e8b9ca90	x86, NMI: Add NMI IPI selftest The previous patch modified the stop cpus path to use NMI instead of IRQ as the way to communicate to the other cpus to shutdown. There were some concerns that various machines may have problems with using an NMI IPI. This patch creates a selftest to check if NMI is working at boot. The idea is to help catch any issues before the machine panics and we learn the hard way. Loosely based on the locking-selftest.c file, this separate file runs a couple of simple tests and reports the results. The output looks like: ... Brought up 4 CPUs ---------------- \| NMI testsuite: -------------------- remote IPI: ok \| local IPI: ok \| -------------------- Good, all 2 testcases passed! \| --------------------------------- Total of 4 processors activated (21330.61 BogoMIPS). ... Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> Cc: seiji.aguchi@hds.com Cc: vgoyal@redhat.com Cc: mjg@redhat.com Cc: tony.luck@intel.com Cc: gong.chen@intel.com Cc: satoru.moriya@hds.com Cc: avi@redhat.com Cc: Andi Kleen <andi@firstfloor.org> Link: http://lkml.kernel.org/r/1318533267-18880-3-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 12:00:16 +01:00
Don Zickus	3603a2512f	x86, reboot: Use NMI instead of REBOOT_VECTOR to stop cpus A recent discussion started talking about the locking on the pstore fs and how it relates to the kmsg infrastructure. We noticed it was possible for userspace to r/w to the pstore fs (grabbing the locks in the process) and block the panic path from r/w to the same fs. The reason was the cpu with the lock could be doing work while the crashing cpu is panic'ing. Busting those spinlocks might cause those cpus to step on each other's data. Fine, fair enough. It was suggested it would be nice to serialize the panic path (ie stop the other cpus) and have only one cpu running. This would allow us to bust the spinlocks and not worry about another cpu stepping on the data. Of course, smp_send_stop() does this in the panic case. kmsg_dump() would have to be moved to be called after it. Easy enough. The only problem is on x86 the smp_send_stop() function calls the REBOOT_VECTOR. Any cpu with irqs disabled (which pstore and its backend ERST would do), block this IPI and thus do not stop. This makes it difficult to reliably log data to the pstore fs. The patch below switches from the REBOOT_VECTOR to NMI (and mimics what kdump does). Switching to NMI allows us to deliver the IPI when irqs are disabled, increasing the reliability of this function. However, Andi carefully noted that on some machines this approach does not work because of broken BIOSes or whatever. To help accomodate this, the next couple of patches will run a selftest and provide a knob to disable. V2: uses atomic ops to serialize the cpu that shuts everyone down V3: comment cleanup Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> Cc: seiji.aguchi@hds.com Cc: vgoyal@redhat.com Cc: mjg@redhat.com Cc: tony.luck@intel.com Cc: gong.chen@intel.com Cc: satoru.moriya@hds.com Cc: avi@redhat.com Cc: Andi Kleen <andi@firstfloor.org> Link: http://lkml.kernel.org/r/1318533267-18880-2-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 12:00:14 +01:00
Jack Steiner	b495e039b4	x86, UV: Fix UV2 hub part number There was a mixup when the SGI UV2 hub chip was sent to be fabricated, and it ended up with the wrong part number in the HRP_NODE_ID mmr. Future versions of the chip will (may) have the correct part number. Change the UV infrastructure to recognize both part numbers as valid IDs of a UV2 hub chip. Signed-off-by: Jack Steiner <steiner@sgi.com> Link: http://lkml.kernel.org/r/20111129210058.GA20452@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 11:49:52 +01:00
Mitsuo Hayasaka	467e6b7a7c	x86: Clean up the range of stack overflow checking The overflow checking of kernel stack checks if the stack pointer points to the available kernel stack range, which is derived from the original overflow checking. It is clear that curbase address is always less than low boundary of available kernel stack. So, this patch removes the first condition that checks if the pointer is higher than curbase. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Randy Dunlap <rdunlap@xenotime.net> Link: http://lkml.kernel.org/r/20111129060845.11076.40916.stgit@ltc219.sdl.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-12-05 11:37:48 +01:00
Mitsuo Hayasaka	55af77969f	x86: Panic on detection of stack overflow Currently, messages are just output on the detection of stack overflow, which is not sufficient for systems that need a high reliability. This is because in general the overflow may corrupt data, and the additional corruption may occur due to reading them unless systems stop. This patch adds the sysctl parameter kernel.panic_on_stackoverflow and causes a panic when detecting the overflows of kernel, IRQ and exception stacks except user stack according to the parameter. It is disabled by default. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: http://lkml.kernel.org/r/20111129060836.11076.12323.stgit@ltc219.sdl.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 11:37:47 +01:00
Mitsuo Hayasaka	37fe6a42b3	x86: Check stack overflow in detail Currently, only kernel stack is checked for the overflow, which is not sufficient for systems that need a high reliability. To enhance it, it is required to check the IRQ and exception stacks, as well. This patch checks all the stack types and will cause messages of stacks in detail when free stack space drops below a certain limit except user stack. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Randy Dunlap <rdunlap@xenotime.net> Link: http://lkml.kernel.org/r/20111129060829.11076.51733.stgit@ltc219.sdl.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-12-05 11:37:45 +01:00
Mitsuo Hayasaka	69682b625a	x86: Add user_mode_vm check in stack_overflow_check The kernel stack overflow is checked in stack_overflow_check(), which may wrongly detect the overflow if the stack pointer in user space points to the kernel stack intentionally or accidentally. So, the actual overflow is never detected after this misdetection because WARN_ONCE() is used on the detection of it. This patch adds user-mode-vm checking before it to avoid this problem and bails out early if the user stack is used. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Randy Dunlap <rdunlap@xenotime.net> Link: http://lkml.kernel.org/r/20111129060821.11076.55315.stgit@ltc219.sdl.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-12-05 11:28:25 +01:00
Robert Richter	16e5294e5f	perf, x86: Force IBS LVT offset assignment for family 10h On AMD family 10h we see firmware bug messages like the following: [Firmware Bug]: cpu 6, try to use APIC500 (LVT offset 0) for vector 0x10400, but the register is already in use for vector 0xf9 on another cpu [Firmware Bug]: cpu 6, IBS interrupt offset 0 not available (MSRC001103A=0x0000000000000100) [Firmware Bug]: using offset 1 for IBS interrupts [Firmware Bug]: workaround enabled for IBS LVT offset perf: AMD IBS detected (0x00000007) We always see this, since the offsets are not assigned by the BIOS for this family. Force LVT offset assignment in this case. If the OS assignment fails, fallback to BIOS settings and try to setup this. The fallback to BIOS settings weakens the family check since force_ibs_eilvt_setup() may fail e.g. in case of virtual machines. But setup may still succeed if BIOS offsets are correct. Other families don't have a workaround implemented that assigns LVT offsets. It's ok, to drop calling force_ibs_eilvt_setup() for that families. With the patch the [Firmware Bug] messages vanish. We see now: IBS: LVT offset 1 assigned perf: AMD IBS detected (0x00000007) Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111109162225.GO12451@erda.amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 09:32:59 +01:00
Peter Zijlstra	6a600a8b87	perf, x86: Disable PEBS on SandyBridge chips Cc: Stephane Eranian <eranian@google.com> Cc: stable@kernel.org Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 09:32:38 +01:00
Linus Torvalds	8e8da023f5	x86: Fix boot failures on older AMD CPU's People with old AMD chips are getting hung boots, because commit `bcb80e5387` ("x86, microcode, AMD: Add microcode revision to /proc/cpuinfo") moved the microcode detection too early into "early_init_amd()". At that point we are so early in the booth that the exception tables haven't even been set up yet, so the whole rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy); doesn't actually work: if the rdmsr does a GP fault (due to non-existant MSR register on older CPU's), we can't fix it up yet, and the boot fails. Fix it by simply moving the code to a slightly later point in the boot (init_amd() instead of early_init_amd()), since the kernel itself doesn't even really care about the microcode patchlevel at this point (or really ever: it's made available to user space in /proc/cpuinfo, and updated if you do a microcode load). Reported-tested-and-bisected-by: Larry Finger <Larry.Finger@lwfinger.net> Tested-by: Bob Tracy <rct@gherkin.frus.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-12-04 11:57:09 -08:00
Konrad Rzeszutek Wilk	e5fd47bfab	xen/pm_idle: Make pm_idle be default_idle under Xen. The idea behind commit `d91ee5863b` ("cpuidle: replace xen access to x86 pm_idle and default_idle") was to have one call - disable_cpuidle() which would make pm_idle not be molested by other code. It disallows cpuidle_idle_call to be set to pm_idle (which is excellent). But in the select_idle_routine() and idle_setup(), the pm_idle can still be set to either: amd_e400_idle, mwait_idle or default_idle. This depends on some CPU flags (MWAIT) and in AMD case on the type of CPU. In case of mwait_idle we can hit some instances where the hypervisor (Amazon EC2 specifically) sets the MWAIT and we get: Brought up 2 CPUs invalid opcode: 0000 [#1] SMP Pid: 0, comm: swapper Not tainted 3.1.0-0.rc6.git0.3.fc16.x86_64 #1 RIP: e030:[<ffffffff81015d1d>] [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4 ... Call Trace: [<ffffffff8100e2ed>] cpu_idle+0xae/0xe8 [<ffffffff8149ee78>] cpu_bringup_and_idle+0xe/0x10 RIP [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4 RSP <ffff8801d28ddf10> In the case of amd_e400_idle we don't get so spectacular crashes, but we do end up making an MSR which is trapped in the hypervisor, and then follow it up with a yield hypercall. Meaning we end up going to hypervisor twice instead of just once. The previous behavior before v3.0 was that pm_idle was set to default_idle regardless of select_idle_routine/idle_setup. We want to do that, but only for one specific case: Xen. This patch does that. Fixes RH BZ #739499 and Ubuntu #881076 Reported-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-12-03 10:49:58 -08:00
Tejun Heo	d4bbf7e775	Merge branch 'master' into x86/memblock Conflicts & resolutions: * arch/x86/xen/setup.c `dc91c728fd` "xen: allow extra memory to be in multiple regions" `24aa07882b` "memblock, x86: Replace memblock_x86_reserve/free..." conflicted on xen_add_extra_mem() updates. The resolution is trivial as the latter just want to replace memblock_x86_reserve_range() with memblock_reserve(). * drivers/pci/intel-iommu.c `166e9278a3` "x86/ia64: intel-iommu: move to drivers/iommu/" `5dfe8660a3` "bootmem: Replace work_with_active_regions() with..." conflicted as the former moved the file under drivers/iommu/. Resolved by applying the chnages from the latter on the moved file. * mm/Kconfig `6661672053` "memblock: add NO_BOOTMEM config symbol" `c378ddd53f` "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option" conflicted trivially. Both added config options. Just letting both add their own options resolves the conflict. * mm/memblock.c `d1f0ece6cd` "mm/memblock.c: small function definition fixes" `ed7b56a799` "memblock: Remove memblock_memory_can_coalesce()" confliected. The former updates function removed by the latter. Resolution is trivial. Signed-off-by: Tejun Heo <tj@kernel.org>	2011-11-28 09:46:22 -08:00
Greg Kroah-Hartman	dd7c7c3f69	Merge 3.2-rc3 into tty-next to handle merge conflict in tty_ldisc.c Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-11-26 20:07:25 -08:00
Deepak Saxena	b7743970b0	time: x86: Remove CLOCK_TICK_RATE from tsc code The tsc code uses CLOCK_TICK_RATE which on x86 is defined to just be the same as PIT_TICK_RATE. This patch updates the code use the later as we want to depecrate and remove the global CLOCK_TICK_RATE symbol. Signed-off-by: Deepak Saxena <dsaxena@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org>	2011-11-21 19:00:56 -08:00
Linus Torvalds	a4cc3889f7	Merge branch 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM guest: prevent tracing recursion with kvmclock Revert "KVM: PPC: Add support for explicit HIOR setting" KVM: VMX: Check for automatic switch msr table overflow KVM: VMX: Add support for guest/host-only profiling KVM: VMX: add support for switching of PERF_GLOBAL_CTRL KVM: s390: announce SYNC_MMU KVM: s390: Fix tprot locking KVM: s390: handle SIGP sense running intercepts KVM: s390: Fix RUNNING flag misinterpretation	2011-11-20 14:57:43 -08:00
Avi Kivity	95ef1e5292	KVM guest: prevent tracing recursion with kvmclock Prevent tracing of preempt_disable() in get_cpu_var() in kvm_clock_read(). When CONFIG_DEBUG_PREEMPT is enabled, preempt_disable/enable() are traced and this causes the function_graph tracer to go into an infinite recursion. By open coding the preempt_disable() around the get_cpu_var(), we can use the notrace version which prevents preempt_disable/enable() from being traced and prevents the recursion. Based on a similar patch for Xen from Jeremy Fitzhardinge. Tested-by: Gleb Natapov <gleb@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-11-20 10:53:48 +02:00
H. Peter Anvin	61f1e7e208	x86, syscall: Re-fix typo in comment Fix the same typo as was fixed in: `b7641d2c` x86-64, syscall: Adjust comment spacing and remove typo ... for the new versions of this file (32-bit and IA32 compat). Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1321569446-20433-4-git-send-email-hpa@linux.intel.com	2011-11-18 16:25:07 -08:00
H. Peter Anvin	303395ac3b	x86: Generate system call tables and unistd_.h from tables Generate system call tables and unistd_.h automatically from the tables in arch/x86/syscalls. All other information, like NR_syscalls, is auto-generated, some of which is in asm-offsets_*.c. This allows us to keep all the system call information in one place, and allows for kernel space and user space to see different information; this is currently used for the ia32 system call numbers when building the 64-bit kernel, but will be used by the x32 ABI in the near future. This also removes some gratuitious differences between i386, x86-64 and ia32; in particular, now all system call tables are generated with the same mechanism. Cc: H. J. Lu <hjl.tools@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-11-17 13:35:37 -08:00
H. Peter Anvin	b7641d2c83	x86-64, syscall: Adjust comment spacing and remove typo Adjust spacing for comment so that it matches the multiline comment style used in the rest of the kernel, and remove word duplication. It is not really clear what version of gcc this refers to, but the extra & doesn't cause any harm, so there is no reason to remove it. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-11-17 13:35:35 -08:00
Mika Westerberg	b82e324b3c	serial, mfd: don't hardcode the console Add support to specify which HSU port to use as an early console. This can be selected by passing "earlyprintk=hsu<n>" on the kernel command line. By default port 0 is still used. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-11-15 15:50:30 -08:00
Alex Williamson	bcb71abe7d	iommu: Add option to group multi-function devices The option iommu=group_mf indicates the that the iommu driver should expose all functions of a multi-function PCI device as the same iommu_device_group. This is useful for disallowing individual functions being exposed as independent devices to userspace as there are often hidden dependencies. Virtual functions are not affected by this option. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2011-11-15 12:22:31 +01:00
Rabin Vincent	78345d2edc	x86: Call stop_machine_text_poke() on all CPUs It appears that stop_machine_text_poke() wants to be called on all CPUs, like it's done from text_poke_smp(). Fix text_poke_smp_batch() to do this. Signed-off-by: Rabin Vincent <rabin@rab.in> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Jason Baron <jbaron@redhat.com> Link: http://lkml.kernel.org/r/1319702072-32676-1-git-send-email-rabin@rab.in Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-11-14 13:05:15 +01:00
Peter Zijlstra	ed13ec58bf	perf/x86: Enable raw event access to Intel offcore events Now that the core offcore support is fixed up (thanks Stephane) and we have sane generic events utilizing them, re-enable the raw access to the feature as well. Note that it doesn't matter if you use event 0x1b7 or 0x1bb to specify an offcore event, either one works and neither guarantees you'll end up on a particular offcore MSR. Based on original patch from: Vince Weaver <vweaver1@eecs.utk.edu>. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Vince Weaver <vweaver1@eecs.utk.edu>. Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1108031200390.703@cl320.eecs.utk.edu Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-11-14 13:03:44 +01:00
Peter Zijlstra	aa2bc1ade5	perf: Don't use -ENOSPC for out of PMU resources People (Linus) objected to using -ENOSPC to signal not having enough resources on the PMU to satisfy the request. Use -EINVAL. Requested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Daney <david.daney@cavium.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-xv8geaz2zpbjhlx0svmpp28n@git.kernel.org [ merged to newer kernel, fixed up MIPS impact ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-11-14 13:01:24 +01:00
Peter Zijlstra	57d1c0c03c	perf/x86: Fix PEBS instruction unwind Masami spotted that we always try to decode the instruction stream as 64bit instructions when running a 64bit kernel, this doesn't work for ia32-compat proglets. Use TIF_IA32 to detect if we need to use the 32bit instruction decoder. Reported-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: stable@kernel.org Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-11-14 13:01:20 +01:00
Mathias Nyman	6fd36ba021	x86, ioapic: Only print ioapic debug information for IRQs belonging to an ioapic chip with "apic=verbose" the print_IO_APIC() function tries to print IRQ to pin mappings for every active irq. It assumes chip_data is of type irq_cfg and may cause an oops if not. As the print_IO_APIC() is called from a late_initcall other chained irq chips may already be registered with custom chip_data information, causing an oops. This is the case with intel MID SoC devices with gpio demuxers registered as irq_chips. Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> [ -v2: fixed build failure ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-11-10 18:31:23 +01:00
Jacob Pan	064a59b6dd	x86/mrst: Avoid reporting wrong nmi status Moorestown/Medfield platform does not have port 0x61 to report NMI status, nor does it have external NMI sources. The only NMI sources are from lapic, as results of perf counter overflow or IPI, e.g. NMI watchdog or spin lock debug. Reading port 0x61 on Moorestown will return 0xff which misled NMI handlers to false critical errors such memory parity error. The subsequent ioport access for NMI handling can also cause undefined behavior on Moorestown. This patch allows kernel process NMI due to watchdog or backrace dump without unnecessary hangs. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> [hand applied] Signed-off-by: Alan Cox <alan@linux.intel.com>	2011-11-10 16:21:01 +01:00
Jacob Pan	1ade93efd0	x86/apic: Allow use of lapic timer early calibration result lapic timer calibration can be combined with tsc in platform specific calibration functions. if such calibration result is obtained early, we can skip the redundant calibration loops. Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-11-10 16:20:57 +01:00
Jacob Pan	bb84ac2d3a	x86/apic: Do not clear nr_irqs_gsi if no legacy irqs nr_legacy_irqs is set in probe_nr_irqs_gsi, we should not clear it after that. Otherwise, the result is that MSI irqs will be allocated from the wrong range for the systems without legacy PIC. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-11-10 16:20:55 +01:00
Feng Tang	cf8ff6b6ab	x86/platform: Add a wallclock_init func to x86_platforms ops Some wall clock devices use MMIO based HW register, this new function will give them a chance to do some initialization work before their get/set_time service get called. Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-11-10 16:20:53 +01:00
Luck, Tony	66f5ddf30a	x86/mce: Make mce_chrdev_ops 'static const' Arjan would like to make struct file_operations const, but mce-inject directly writes to the mce_chrdev_ops to install its write handler. In an ideal world mce-inject would have its own character device, but we have a sizable legacy of test scripts that hardwire "/dev/mcelog", so it would be painful to switch to a separate device now. Instead, this patch switches to a stub function in the mce code, with a registration helper that mce-inject can call when it is loaded. Note that this would also allow for a sane process to allow mce-inject to be unloaded again (with an unregister function, and appropriate module_{get,put}() calls), but that is left for potential future patches. Reported-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/4eb2e1971326651a3b@agluck-desktop.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-11-08 16:17:11 +01:00
Linus Torvalds	b32fc0a062	Merge branch 'upstream/jump-label-noearly' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'upstream/jump-label-noearly' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: jump-label: initialize jump-label subsystem much earlier x86/jump_label: add arch_jump_label_transform_static() s390/jump-label: add arch_jump_label_transform_static() jump_label: add arch_jump_label_transform_static() to optimise non-live code updates sparc/jump_label: drop arch_jump_label_text_poke_early() x86/jump_label: drop arch_jump_label_text_poke_early() jump_label: if a key has already been initialized, don't nop it out stop_machine: make stop_machine safe and efficient to call early jump_label: use proper atomic_t initializer Conflicts: - arch/x86/kernel/jump_label.c Added __init_or_module to arch_jump_label_text_poke_early vs removal of that function entirely - kernel/stop_machine.c same patch ("stop_machine: make stop_machine safe and efficient to call early") merged twice, with whitespace fix in one version	2011-11-06 20:20:46 -08:00
Linus Torvalds	32aaeffbd4	Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h	2011-11-06 19:44:47 -08:00
Linus Torvalds	6681ba7ec4	Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (21 commits) MAINTAINERS: add an entry for Edac Sandy Bridge driver edac: tag sb_edac as EXPERIMENTAL, as it requires more testing EDAC: Fix incorrect edac mode reporting in sb_edac edac: sb_edac: Add it to the building system edac: Add an experimental new driver to support Sandy Bridge CPU's i7300_edac: Fix error cleanup logic i7core_edac: Initialize memory name with cpu, channel, bank i7core_edac: Fix compilation on 32 bits arch i7core_edac: scrubbing fixups EDAC: Correct Kconfig dependencies i7core_edac: return -ENODEV if no MC is found i7core_edac: use edac's own way to print errors MAINTAINERS: remove dropped edac_mce.* from the file i7core_edac: Drop the edac_mce facility x86, MCE: Use notifier chain only for MCE decoding EDAC i7core: Use mce socketid for better compatibility i7core_edac: Don't enable memory scrubbing for Xeon 35xx i7core_edac: Add scrubbing support edac: Move edac main structs to include/linux/edac.h i7core_edac: Fix oops when trying to inject errors ...	2011-11-02 16:55:15 -07:00
Borislav Petkov	4140c54266	i7core_edac: Drop the edac_mce facility Remove edac_mce pieces and use the normal MCE decoder notifier chain by retaining the same functionality with considerably less code. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2011-11-01 10:01:24 -02:00
Christopher Yeoh	fcf634098c	Cross Memory Attach The basic idea behind cross memory attach is to allow MPI programs doing intra-node communication to do a single copy of the message rather than a double copy of the message via shared memory. The following patch attempts to achieve this by allowing a destination process, given an address and size from a source process, to copy memory directly from the source process into its own address space via a system call. There is also a symmetrical ability to copy from the current process's address space into a destination process's address space. - Use of /proc/pid/mem has been considered, but there are issues with using it: - Does not allow for specifying iovecs for both src and dest, assuming preadv or pwritev was implemented either the area read from or written to would need to be contiguous. - Currently mem_read allows only processes who are currently ptrace'ing the target and are still able to ptrace the target to read from the target. This check could possibly be moved to the open call, but its not clear exactly what race this restriction is stopping (reason appears to have been lost) - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix domain socket is a bit ugly from a userspace point of view, especially when you may have hundreds if not (eventually) thousands of processes that all need to do this with each other - Doesn't allow for some future use of the interface we would like to consider adding in the future (see below) - Interestingly reading from /proc/pid/mem currently actually involves two copies! (But this could be fixed pretty easily) As mentioned previously use of vmsplice instead was considered, but has problems. Since you need the reader and writer working co-operatively if the pipe is not drained then you block. Which requires some wrapping to do non blocking on the send side or polling on the receive. In all to all communication it requires ordering otherwise you can deadlock. And in the example of many MPI tasks writing to one MPI task vmsplice serialises the copying. There are some cases of MPI collectives where even a single copy interface does not get us the performance gain we could. For example in an MPI_Reduce rather than copy the data from the source we would like to instead use it directly in a mathops (say the reduce is doing a sum) as this would save us doing a copy. We don't need to keep a copy of the data from the source. I haven't implemented this, but I think this interface could in the future do all this through the use of the flags - eg could specify the math operation and type and the kernel rather than just copying the data would apply the specified operation between the source and destination and store it in the destination. Although we don't have a "second user" of the interface (though I've had some nibbles from people who may be interested in using it for intra process messaging which is not MPI). This interface is something which hardware vendors are already doing for their custom drivers to implement fast local communication. And so in addition to this being useful for OpenMPI it would mean the driver maintainers don't have to fix things up when the mm changes. There was some discussion about how much faster a true zero copy would go. Here's a link back to the email with some testing I did on that: http://marc.info/?l=linux-mm&m=130105930902915&w=2 There is a basic man page for the proposed interface here: http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt This has been implemented for x86 and powerpc, other architecture should mainly (I think) just need to add syscall numbers for the process_vm_readv and process_vm_writev. There are 32 bit compatibility versions for 64-bit kernels. For arch maintainers there are some simple tests to be able to quickly verify that the syscalls are working correctly here: http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz Signed-off-by: Chris Yeoh <yeohc@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: <linux-man@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-10-31 17:30:44 -07:00
Paul Gortmaker	69c60c88ee	x86: Fix files explicitly requiring export.h for EXPORT_SYMBOL/THIS_MODULE These files were implicitly getting EXPORT_SYMBOL via device.h which was including module.h, but that will be fixed up shortly. By fixing these now, we can avoid seeing things like: arch/x86/kernel/rtc.c:29: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’ arch/x86/kernel/pci-dma.c:20: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’ arch/x86/kernel/e820.c:69: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL_GPL’ [ with input from Randy Dunlap <rdunlap@xenotime.net> and also from Stephen Rothwell <sfr@canb.auug.org.au> ] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:30:35 -04:00
Paul Gortmaker	2957402297	x86: fix implicit include of <linux/topology.h> in vsyscall_64 In removing the presence of <linux/module.h> from some of the more common <linux/something.h> files, this implict include of <linux/topology.h> was uncovered. CC arch/x86/kernel/vsyscall_64.o arch/x86/kernel/vsyscall_64.c: In function ‘vsyscall_set_cpu’: arch/x86/kernel/vsyscall_64.c:259: error: implicit declaration of function ‘cpu_to_node’ Explicitly call it out so the cleanup can take place. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:30:34 -04:00
Borislav Petkov	f0cb545243	x86, MCE: Use notifier chain only for MCE decoding Drop the edac_mce custom hook in favor of the generic notifier mechanism. Also, do not log the error to mcelog if the notified agent was able to decode it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2011-10-31 15:10:05 -02:00
Linus Torvalds	d630ba565f	Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: uv2: Workaround for UV2 Hub bug (system global address format)	2011-10-28 05:43:56 -07:00
Linus Torvalds	8e6d539e0f	Merge branch 'x86-rdrand-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-rdrand-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, random: Verify RDRAND functionality and allow it to be disabled x86, random: Architectural inlines to get random integers with RDRAND random: Add support for architectural random hooks Fix up trivial conflicts in drivers/char/random.c: the architectural random hooks touched "get_random_int()" that was simplified to use MD5 and not do the keyptr thing any more (see commit `6e5714eaf7`: "net: Compute protocol sequence numbers and fragment IDs using MD5").	2011-10-28 05:29:07 -07:00
Linus Torvalds	8237eb946a	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode, AMD: Add microcode revision to /proc/cpuinfo x86, microcode: Correct microcode revision format coretemp: Get microcode revision from cpu_data x86, intel: Use c->microcode for Atom errata check x86, intel: Output microcode revision in /proc/cpuinfo x86, microcode: Don't request microcode from userspace unnecessarily Fix up trivial conflicts in arch/x86/kernel/cpu/amd.c (conflict between moving AMD BSP code to cpu_dev helper function and adding AMD microcode revision to /proc/cpuinfo code)	2011-10-28 05:14:48 -07:00
Linus Torvalds	cc21fe518a	Merge branch 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Hyper-V: Integrate the clocksource with Hyper-V detection code Fix up conflicts in drivers/staging/hv/Makefile manually (some of the hv code has moved out of staging to drivers/hv/)	2011-10-28 05:08:40 -07:00
Linus Torvalds	e34eb39c1c	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, amd: Include linux/elf.h since we use stuff from asm/elf.h x86: cache_info: Update calculation of AMD L3 cache indices x86: cache_info: Kill the atomic allocation in amd_init_l3_cache() x86: cache_info: Kill the moronic shadow struct x86: cache_info: Remove bogus free of amd_l3_cache data x86, amd: Include elf.h explicitly, prepare the code for the module.h split x86-32, amd: Move va_align definition to unbreak 32-bit build x86, amd: Move BSP code to cpu_dev helper x86: Add a BSP cpu_dev helper x86, amd: Avoid cache aliasing penalties on AMD family 15h	2011-10-28 05:03:12 -07:00
Linus Torvalds	dfa4a423cf	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64, unistd: Remove bogus __IGNORE_getcpu x86, mm, trivial: Remove unnecessary get_order() in free_thread_info() x86, cleanup: Remove unneeded version.h include from arch/x86/	2011-10-26 17:43:08 +02:00
Linus Torvalds	7b86572a7a	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64: Fix CFI data for interrupt frames x86-64: Don't apply destructive erratum workaround on unaffected CPUs	2011-10-26 17:42:03 +02:00
Linus Torvalds	0791e98dd1	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/irq: Standardize on CONFIG_SPARSE_IRQ=y x86, ioapic: Clean up ioapic/apic_id usage x86, ioapic: Factor out print_IO_APIC() to only print one io apic x86, ioapic: Print out irte with right ioapic index x86, ioapic: Split up setup_ioapic_entry() x86, ioapic: Pass struct irq_attr * to setup_ioapic_irq() apic, i386/bigsmp: Fix false warnings regarding logical APIC ID mismatches	2011-10-26 17:30:33 +02:00
Linus Torvalds	7115e3fcf4	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (121 commits) perf symbols: Increase symbol KSYM_NAME_LEN size perf hists browser: Refuse 'a' hotkey on non symbolic views perf ui browser: Use libslang to read keys perf tools: Fix tracing info recording perf hists browser: Elide DSO column when it is set to just one DSO, ditto for threads perf hists: Don't consider filtered entries when calculating column widths perf hists: Don't decay total_period for filtered entries perf hists browser: Honour symbol_conf.show_{nr_samples,total_period} perf hists browser: Do not exit on tab key with single event perf annotate browser: Don't change selection line when returning from callq perf tools: handle endianness of feature bitmap perf tools: Add prelink suggestion to dso update message perf script: Fix unknown feature comment perf hists browser: Apply the dso and thread filters when merging new batches perf hists: Move the dso and thread filters from hist_browser perf ui browser: Honour the xterm colors perf top tui: Give color hints just on the percentage, like on --stdio perf ui browser: Make the colors configurable and change the defaults perf tui: Remove unneeded call to newtCls on startup perf hists: Don't format the percentage on hist_entry__snprintf ... Fix up conflicts in arch/x86/kernel/kprobes.c manually. Ingo's tree did the insane "add volatile to const array", which just doesn't make sense ("volatile const"?). But we could remove the const and make the array volatile to make doubly sure that gcc doesn't optimize it away.. Also fix up kernel/trace/ring_buffer.c non-data-conflicts manually: the reader_lock has been turned into a raw lock by the core locking merge, and there was a new user of it introduced in this perf core merge. Make sure that new use also uses the raw accessor functions.	2011-10-26 17:03:38 +02:00
Linus Torvalds	3cfef95246	Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) rtmutex: Add missing rcu_read_unlock() in debug_rt_mutex_print_deadlock() lockdep: Comment all warnings lib: atomic64: Change the type of local lock to raw_spinlock_t locking, lib/atomic64: Annotate atomic64_lock::lock as raw locking, x86, iommu: Annotate qi->q_lock as raw locking, x86, iommu: Annotate irq_2_ir_lock as raw locking, x86, iommu: Annotate iommu->register_lock as raw locking, dma, ipu: Annotate bank_lock as raw locking, ARM: Annotate low level hw locks as raw locking, drivers/dca: Annotate dca_lock as raw locking, powerpc: Annotate uic->lock as raw locking, x86: mce: Annotate cmci_discover_lock as raw locking, ACPI: Annotate c3_lock as raw locking, oprofile: Annotate oprofilefs lock as raw locking, video: Annotate vga console lock as raw locking, latencytop: Annotate latency_lock as raw locking, timer_stats: Annotate table_lock as raw locking, rwsem: Annotate inner lock as raw locking, semaphores: Annotate inner lock as raw locking, sched: Annotate thread_group_cputimer as raw ... Fix up conflicts in kernel/posix-cpu-timers.c manually: making cputimer->cputime a raw lock conflicted with the ABBA fix in commit `bcd5cff721` ("cputimer: Cure lock inversion").	2011-10-26 16:17:32 +02:00
Linus Torvalds	982653009b	Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, ioapic: Consolidate the explicit EOI code x86, ioapic: Restore the mask bit correctly in eoi_ioapic_irq() x86, kdump, ioapic: Reset remote-IRR in clear_IO_APIC iommu: Rename the DMAR and INTR_REMAP config options x86, ioapic: Define irq_remap_modify_chip_defaults() x86, msi, intr-remap: Use the ioapic set affinity routine iommu: Cleanup ifdefs in detect_intel_iommu() iommu: No need to set dmar_disabled in check_zero_address() iommu: Move IOMMU specific code to intel-iommu.c intr_remap: Call dmar_dev_scope_init() explicitly x86, x2apic: Enable the bios request for x2apic optout	2011-10-26 16:11:53 +02:00
Jeremy Fitzhardinge	e71a5be15e	x86/jump_label: add arch_jump_label_transform_static() This allows jump-label entries to be cheaply updated on code which is not yet live. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Jason Baron <jbaron@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2011-10-25 11:54:42 -07:00
Jeremy Fitzhardinge	b7e3155818	x86/jump_label: drop arch_jump_label_text_poke_early() It is no longer used. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Jason Baron <jbaron@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2011-10-25 11:54:21 -07:00
Linus Torvalds	59e5253417	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits) MAINTAINERS: linux-m32r is moderated for non-subscribers linux@lists.openrisc.net is moderated for non-subscribers Drop default from "DM365 codec select" choice parisc: Kconfig: cleanup Kernel page size default Kconfig: remove redundant CONFIG_ prefix on two symbols cris: remove arch/cris/arch-v32/lib/nand_init.S microblaze: add missing CONFIG_ prefixes h8300: drop puzzling Kconfig dependencies MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers tty: drop superfluous dependency in Kconfig ARM: mxc: fix Kconfig typo 'i.MX51' Fix file references in Kconfig files aic7xxx: fix Kconfig references to READMEs Fix file references in drivers/ide/ thinkpad_acpi: Fix printk typo 'bluestooth' bcmring: drop commented out line in Kconfig btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888' doc: raw1394: Trivial typo fix CIFS: Don't free volume_info->UNC until we are entirely done with it. treewide: Correct spelling of successfully in comments ...	2011-10-25 12:11:02 +02:00
Josh Stone	315eb8a2a1	x86: Fix compilation bug in kprobes' twobyte_is_boostable When compiling an i386_defconfig kernel with gcc-4.6.1-9.fc15.i686, I noticed a warning about the asm operand for test_bit in kprobes' can_boost. I discovered that this caused only the first long of twobyte_is_boostable[] to be output. Jakub filed and fixed gcc PR50571 to correct the warning and this output issue. But to solve it for less current gcc, we can make kprobes' twobyte_is_boostable[] non-const, and it won't be optimized out. Before: CC arch/x86/kernel/kprobes.o In file included from include/linux/bitops.h:22:0, from include/linux/kernel.h:17, from [...]/arch/x86/include/asm/percpu.h:44, from [...]/arch/x86/include/asm/current.h:5, from [...]/arch/x86/include/asm/processor.h:15, from [...]/arch/x86/include/asm/atomic.h:6, from include/linux/atomic.h:4, from include/linux/mutex.h:18, from include/linux/notifier.h:13, from include/linux/kprobes.h:34, from arch/x86/kernel/kprobes.c:43: [...]/arch/x86/include/asm/bitops.h: In function ‘can_boost.part.1’: [...]/arch/x86/include/asm/bitops.h:319:2: warning: use of memory input without lvalue in asm operand 1 is deprecated [enabled by default] $ objdump -rd arch/x86/kernel/kprobes.o \| grep -A1 -w bt 551: 0f a3 05 00 00 00 00 bt %eax,0x0 554: R_386_32 .rodata.cst4 $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o arch/x86/kernel/kprobes.o: file format elf32-i386 Contents of section .data: 0000 48000000 00000000 00000000 00000000 H............... Contents of section .rodata.cst4: 0000 4c030000 L... Only a single long of twobyte_is_boostable[] is in the object file. After, without the const on twobyte_is_boostable: $ objdump -rd arch/x86/kernel/kprobes.o \| grep -A1 -w bt 551: 0f a3 05 20 00 00 00 bt %eax,0x20 554: R_386_32 .data $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o arch/x86/kernel/kprobes.o: file format elf32-i386 Contents of section .data: 0000 48000000 00000000 00000000 00000000 H............... 0010 00000000 00000000 00000000 00000000 ................ 0020 4c030000 0f000200 ffff0000 ffcff0c0 L............... 0030 0000ffff 3bbbfff8 03ff2ebb 26bb2e77 ....;.......&..w Now all 32 bytes are output into .data instead. Signed-off-by: Josh Stone <jistone@redhat.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-10-25 09:00:14 +02:00
Borislav Petkov	bcb80e5387	x86, microcode, AMD: Add microcode revision to /proc/cpuinfo Enable microcode revision output for AMD after `506ed6b53e` ("x86, intel: Output microcode revision in /proc/cpuinfo") did it for Intel. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-10-19 16:07:30 +02:00
Borislav Petkov	881e23e567	x86, microcode: Correct microcode revision format `506ed6b53e` ("x86, intel: Output microcode revision in /proc/cpuinfo") added microcode revision format to /proc/cpuinfo and the MCE handler in decimal format but both AMD and Intel patch levels are handled as hex numbers. Fix it. Acked-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-10-19 15:47:48 +02:00
Josh Stone	db45bd90be	x86, perf, kprobes: Make kprobes's twobyte_is_boostable volatile When compiling an i386_defconfig kernel with gcc-4.6.1-9.fc15.i686, I noticed a warning about the asm operand for test_bit in kprobes' can_boost. I discovered that this caused only the first long of twobyte_is_boostable[] to be output. Jakub filed and fixed gcc PR50571 to correct the warning and this output issue. But to solve it for less current gcc, we can make kprobes' twobyte_is_boostable[] volatile, and it won't be optimized out. Before: CC arch/x86/kernel/kprobes.o In file included from include/linux/bitops.h:22:0, from include/linux/kernel.h:17, from [...]/arch/x86/include/asm/percpu.h:44, from [...]/arch/x86/include/asm/current.h:5, from [...]/arch/x86/include/asm/processor.h:15, from [...]/arch/x86/include/asm/atomic.h:6, from include/linux/atomic.h:4, from include/linux/mutex.h:18, from include/linux/notifier.h:13, from include/linux/kprobes.h:34, from arch/x86/kernel/kprobes.c:43: [...]/arch/x86/include/asm/bitops.h: In function ‘can_boost.part.1’: [...]/arch/x86/include/asm/bitops.h:319:2: warning: use of memory input without lvalue in asm operand 1 is deprecated [enabled by default] $ objdump -rd arch/x86/kernel/kprobes.o \| grep -A1 -w bt 551: 0f a3 05 00 00 00 00 bt %eax,0x0 554: R_386_32 .rodata.cst4 $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o arch/x86/kernel/kprobes.o: file format elf32-i386 Contents of section .data: 0000 48000000 00000000 00000000 00000000 H............... Contents of section .rodata.cst4: 0000 4c030000 L... Only a single long of twobyte_is_boostable[] is in the object file. After, with volatile: $ objdump -rd arch/x86/kernel/kprobes.o \| grep -A1 -w bt 551: 0f a3 05 20 00 00 00 bt %eax,0x20 554: R_386_32 .data $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o arch/x86/kernel/kprobes.o: file format elf32-i386 Contents of section .data: 0000 48000000 00000000 00000000 00000000 H............... 0010 00000000 00000000 00000000 00000000 ................ 0020 4c030000 0f000200 ffff0000 ffcff0c0 L............... 0030 0000ffff 3bbbfff8 03ff2ebb 26bb2e77 ....;.......&..w Now all 32 bytes are output into .data instead. Signed-off-by: Josh Stone <jistone@redhat.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Jakub Jelinek <jakub@redhat.com> Link: http://lkml.kernel.org/r/1318899645-4068-1-git-send-email-jistone@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-18 08:43:08 +02:00
Andi Kleen	30963c0ac7	x86, intel: Use c->microcode for Atom errata check Now that the cpu update level is available the Atom PSE errata check can use it directly without reading the MSR again. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1318466795-7393-2-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-14 13:16:38 +02:00
Andi Kleen	506ed6b53e	x86, intel: Output microcode revision in /proc/cpuinfo I got a request to make it easier to determine the microcode update level on Intel CPUs. This patch adds a new "microcode" field to /proc/cpuinfo. The microcode level is also outputed on fatal machine checks together with the other CPUID model information. I removed the respective code from the microcode update driver, it just reads the field from cpu_data. Also when the microcode is updated it fills in the new values too. I had to add a memory barrier to native_cpuid to prevent it being optimized away when the result is not used. This turns out to clean up further code which already got this information manually. This is done in followon patches. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1318466795-7393-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-14 13:16:35 +02:00
Srivatsa S. Bhat	70989449da	x86, microcode: Don't request microcode from userspace unnecessarily Requesting the microcode from userspace every time when onlining CPUs (during a CPU hotplug operation) is unnecessary. Thus, ensure that once the kernel gets the microcode after booting, it is not freed nor invalidated when a CPU goes offline, so that it can be reused when that CPU comes back online, without requesting userspace for it again. As a result, the CPU hotplug operations become faster as well. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Link: http://lkml.kernel.org/r/4E91F908.5010006@linux.vnet.ibm.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-10-13 16:20:35 +02:00
Yinghai Lu	141d55e6cc	x86/irq: Standardize on CONFIG_SPARSE_IRQ=y Sparseirq got introduced in v2.6.28 and Thomas did a huge cleanup around v2.6.38 that eliminated basically all disadvantages of it. So we can remove non-sparseirq support now and simplify our IRQ degrees of freedom a bit. Suggested-and-acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/4E95E21D.6090200@oracle.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-13 12:12:12 +02:00
Ingo Molnar	910e94dd0c	Merge branch 'tip/perf/core' of git://github.com/rostedt/linux into perf/core	2011-10-12 17:14:47 +02:00
Yinghai Lu	6f50d45fae	x86, ioapic: Clean up ioapic/apic_id usage While looking at the code, apic_id sometime is referred to index of ioapic, but sometime is used for phys apic id. and some even use apic for real apic id. It is very confusing. So try to limit apic_id or ioapic_id to be real apic id for ioapic, and use ioapic_idx for ioapic index in the array. -v2: Suggested by Ingo, use ioapic_idx consistently, instead of ioapic Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/4E9542DC.3090509@oracle.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-12 09:55:28 +02:00
Yinghai Lu	cda417dd87	x86, ioapic: Factor out print_IO_APIC() to only print one io apic It is getting too big after the interrupt remaping entries debug print out was added. Original print_IO_APIC() becomes print_IO_APICs(). New print_IO_APIC() will only print one ioapic's registers As a side-effect this clean-up also made checkpatch.pl happier. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/4E9542D3.5000008@oracle.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-12 09:55:25 +02:00
Yinghai Lu	3a61d7feca	x86, ioapic: Print out irte with right ioapic index While checking irte dump in dmesg, the print out is confusing ioapic index with real io apic id: IOAPIC[0]: Set routing entry (1-1 -> 0x31 -> IRQ 1 Mode:0 Active:0 Dest:1) IOAPIC[1]: Set IRTE entry (P:1 FPD:0 Dst_Mode:1 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:1 Avail:0 Vector:31 Dest:00000001 SID:00FF SQ:0 SVT:1) IOAPIC[0]: Set routing entry (1-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:1) IOAPIC[1]: Set IRTE entry (P:1 FPD:0 Dst_Mode:1 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:1 Avail:0 Vector:30 Dest:00000001 SID:00FF SQ:0 SVT:1) The system's first ioapic id is 1. This commit: \| commit `3040db92ee` \| Author: Naga Chumbalkar <nagananda.chumbalkar@hp.com> \| Date: Tue Jul 12 21:17:41 2011 +0000 \| \| x86, ioapic: Print IRTE when IR is enabled Confused apic_id with the ioapic ID - fix it. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/4E9542C8.8040209@oracle.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-12 09:55:21 +02:00
Yinghai Lu	c5b4712c3f	x86, ioapic: Split up setup_ioapic_entry() Ingo pointed out that setup_ioapic_entry() is way too big now. Split the intr-remap code out into setup_ir_ioapic_entry(). Also pass struct io_apic_irq_attr * instead of 5 parameters in those two functions. At last in setup_ir_ioapic_entry() we don't need to panic. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/4E9542BB.4070807@oracle.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-12 09:55:18 +02:00
Yinghai Lu	e4aff81182	x86, ioapic: Pass struct irq_attr * to setup_ioapic_irq() Do not expand that struct, and just pass pointer to reduce the number of parameters in related functions. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/4E9542B1.7050800@oracle.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-12 09:55:15 +02:00
Adrian Bunk	2b666859ec	x86: Default to vsyscall=native for now This UML breakage: linux-2.6.30.1[3800] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb9c498 ax:ffffffffff600000 si:0 di:606790 linux-2.6.30.1[3856] vsyscall fault (exploit attempt?) ip:ffffffffff600000 cs:33 sp:7fbfb13168 ax:ffffffffff600000 si:0 di:606790 Is caused by commit `3ae36655` ("x86-64: Rework vsyscall emulation and add vsyscall= parameter") - the vsyscall emulation code is not fully cooked yet as UML relies on some rather fragile SIGSEGV semantics. Linus suggested in https://lkml.org/lkml/2011/8/9/376 to default to vsyscall=native for now, this patch implements that. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Andrew Lutomirski <luto@mit.edu> Cc: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/20111005214047.GE14406@localhost.pp.htv.fi Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-11 08:23:34 +02:00
Ingo Molnar	d48b0e1737	x86, nmi, drivers: Fix nmi splitup build bug nmi.c needs an #include <linux/mca.h>: arch/x86/kernel/nmi.c: In function ‘unknown_nmi_error’: arch/x86/kernel/nmi.c:286:6: error: ‘MCA_bus’ undeclared (first use in this function) arch/x86/kernel/nmi.c:286:6: note: each undeclared identifier is reported only once for each function it appears in Another one is the hpwdt driver: drivers/watchdog/hpwdt.c:507:9: error: ‘NMI_DONE’ undeclared (first use in this function) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 06:57:21 +02:00
Robert Richter	b716916679	perf, x86: Implement IBS initialization This patch implements IBS feature detection and initialzation. The code is shared between perf and oprofile. If IBS is available on the system for perf, a pmu is setup. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1316597423-25723-3-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 06:57:16 +02:00
Robert Richter	ee5789dbcc	perf, x86: Share IBS macros between perf and oprofile Moving IBS macros from oprofile to <asm/perf_event.h> to make it available to perf. No additional changes. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1316597423-25723-2-git-send-email-robert.richter@amd.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 06:57:11 +02:00
Don Zickus	efc3aac5f3	x86, nmi: Track NMI usage stats Now that the NMI handler are broken into lists, increment the appropriate stats for each list. This allows us to see what is going on when they get printed out in the next patch. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317409584-23662-6-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 06:57:06 +02:00
Don Zickus	b227e23399	x86, nmi: Add in logic to handle multiple events and unknown NMIs Previous patches allow the NMI subsystem to process multipe NMI events in one NMI. As previously discussed this can cause issues when an event triggered another NMI but is processed in the current NMI. This causes the next NMI to go unprocessed and become an 'unknown' NMI. To handle this, we first have to flag whether or not the NMI handler handled more than one event or not. If it did, then there exists a chance that the next NMI might be already processed. Once the NMI is flagged as a candidate to be swallowed, we next look for a back-to-back NMI condition. This is determined by looking at the %rip from pt_regs. If it is the same as the previous NMI, it is assumed the cpu did not have a chance to jump back into a non-NMI context and execute code and instead handled another NMI. If both of those conditions are true then we will swallow any unknown NMI. There still exists a chance that we accidentally swallow a real unknown NMI, but for now things seem better. An optimization has also been added to the nmi notifier rountine. Because x86 can latch up to one NMI while currently processing an NMI, we don't have to worry about executing _all_ the handlers in a standalone NMI. The idea is if multiple NMIs come in, the second NMI will represent them. For those back-to-back NMI cases, we have the potentail to drop NMIs. Therefore only execute all the handlers in the second half of a detected back-to-back NMI. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317409584-23662-5-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 06:57:01 +02:00
Don Zickus	9c48f1c629	x86, nmi: Wire up NMI handlers to new routines Just convert all the files that have an nmi handler to the new routines. Most of it is straight forward conversion. A couple of places needed some tweaking like kgdb which separates the debug notifier from the nmi handler and mce removes a call to notify_die. [Thanks to Ying for finding out the history behind that mce call https://lkml.org/lkml/2010/5/27/114 And Boris responding that he would like to remove that call because of it https://lkml.org/lkml/2011/9/21/163] The things that get converted are the registeration/unregistration routines and the nmi handler itself has its args changed along with code removal to check which list it is on (most are on one NMI list except for kgdb which has both an NMI routine and an NMI Unknown routine). Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Corey Minyard <minyard@acm.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Corey Minyard <minyard@acm.org> Cc: Jack Steiner <steiner@sgi.com> Link: http://lkml.kernel.org/r/1317409584-23662-4-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 06:56:57 +02:00
Don Zickus	c9126b2ee8	x86, nmi: Create new NMI handler routines The NMI handlers used to rely on the notifier infrastructure. This worked great until we wanted to support handling multiple events better. One of the key ideas to the nmi handling is to process _all_ the handlers for each NMI. The reason behind this switch is because NMIs are edge triggered. If enough NMIs are triggered, then they could be lost because the cpu can only latch at most one NMI (besides the one currently being processed). In order to deal with this we have decided to process all the NMI handlers for each NMI. This allows the handlers to determine if they recieved an event or not (the ones that can not determine this will be left to fend for themselves on the unknown NMI list). As a result of this change it is now possible to have an extra NMI that was destined to be received for an already processed event. Because the event was processed in the previous NMI, this NMI gets dropped and becomes an 'unknown' NMI. This of course will cause printks that scare people. However, we prefer to have extra NMIs as opposed to losing NMIs and as such are have developed a basic mechanism to catch most of them. That will be a later patch. To accomplish this idea, I unhooked the nmi handlers from the notifier routines and created a new mechanism loosely based on doIRQ. The reason for this is the notifier routines have a couple of shortcomings. One we could't guarantee all future NMI handlers used NOTIFY_OK instead of NOTIFY_STOP. Second, we couldn't keep track of the number of events being handled in each routine (most only handle one, perf can handle more than one). Third, I wanted to eventually display which nmi handlers are registered in the system in /proc/interrupts to help see who is generating NMIs. The patch below just implements the new infrastructure but doesn't wire it up yet (that is the next patch). Its design is based on doIRQ structs and the atomic notifier routines. So the rcu stuff in the patch isn't entirely untested (as the notifier routines have soaked it) but it should be double checked in case I copied the code wrong. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317409584-23662-3-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 06:56:52 +02:00
Don Zickus	1d48922c14	x86, nmi: Split out nmi from traps.c The nmi stuff is changing a lot and adding more functionality. Split it out from the traps.c file so it doesn't continue to pollute that file. This makes it easier to find and expand all the future nmi related work. No real functional changes here. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317409584-23662-2-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 06:56:47 +02:00
Gleb Natapov	144d31e6f1	perf, intel: Use GO/HO bits in perf-ctr Intel does not have guest/host-only bit in perf counters like AMD does. To support GO/HO bits KVM needs to switch EVENTSELn values (or PERF_GLOBAL_CTRL if available) at a guest entry. If a counter is configured to count only in a guest mode it stays disabled in a host, but VMX is configured to switch it to enabled value during guest entry. This patch adds GO/HO tracking to Intel perf code and provides interface for KVM to get a list of MSRs that need to be switched on a guest entry. Only cpus with architectural PMU (v1 or later) are supported with this patch. To my knowledge there is not p6 models with VMX but without architectural PMU and p4 with VMX are rare and the interface is general enough to support them if need arise. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317816084-18026-7-git-send-email-gleb@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-10 06:56:42 +02:00
Joerg Roedel	011af85784	perf, amd: Use GO/HO bits in perf-ctr The AMD perf-counters support counting in guest or host-mode only. Make use of that feature when user-space specified guest/host-mode only counting. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317816084-18026-3-git-send-email-gleb@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-06 13:00:31 +02:00
Ingo Molnar	7b4f86ac05	Merge branch 'ras' of git://amd64.org/linux/bp into perf/core	2011-10-06 12:54:36 +02:00
Ingo Molnar	9d01402023	Merge commit 'v3.1-rc9' into perf/core Merge reason: pick up latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-06 12:49:21 +02:00
Jan Beulich	eab9e6137f	x86-64: Fix CFI data for interrupt frames The patch titled "x86: Don't use frame pointer to save old stack on irq entry" did not properly adjust CFI directives, so this patch is a follow-up to that one. With the old stack pointer no longer stored in a callee-saved register (plus some offset), we now have to use a CFA expression to describe the memory location where it is being found. This requires the use of .cfi_escape (allowing arbitrary byte streams to be emitted into .eh_frame), as there is no .cfi_def_cfa_expression (which also cannot reasonably be expected, as it would require a full expression parser). Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/4E8360200200007800058467@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-28 19:04:52 +02:00
Jan Beulich	838312be46	apic, i386/bigsmp: Fix false warnings regarding logical APIC ID mismatches These warnings (generally one per CPU) are a result of initializing x86_cpu_to_logical_apicid while apic_default is still in use, but the check in setup_local_APIC() being done when apic_bigsmp was already used as an override in default_setup_apic_routing(): Overriding APIC driver with bigsmp Enabling APIC mode: Physflat. Using 5 I/O APICs ------------[ cut here ]------------ WARNING: at .../arch/x86/kernel/apic/apic.c:1239 ... CPU 1 irqstacks, hard=f1c9a000 soft=f1c9c000 Booting Node 0, Processors #1 smpboot cpu 1: start_ip = 9e000 Initializing CPU#1 ------------[ cut here ]------------ WARNING: at .../arch/x86/kernel/apic/apic.c:1239 setup_local_APIC+0x137/0x46b() Hardware name: ... CPU1 logical APIC ID: 2 != 8 ... Fix this (for the time being, i.e. until x86_32_early_logical_apicid() will get removed again, as Tejun says ought to be possible) by overriding the previously stored values at the point where the APIC driver gets overridden. v2: Move this and the pre-existing override logic into arch/x86/kernel/apic/bigsmp_32.c. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: <stable@kernel.org> (2.6.39 and onwards) Link: http://lkml.kernel.org/r/4E835D16020000780005844C@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-28 19:01:53 +02:00
Stephen Rothwell	910b2c5122	x86, amd: Include linux/elf.h since we use stuff from asm/elf.h After merging the moduleh tree, today's linux-next build (x86_64 allmodconfig) failed like this: arch/x86/kernel/sys_x86_64.c:28:10: warning: 'enum align_flags' declared inside parameter list arch/x86/kernel/sys_x86_64.c:28:10: warning: its scope is only this definition or declaration, which is probably not what you want arch/x86/kernel/sys_x86_64.c:28:22: error: parameter 3 ('flags') has incomplete type [...] Presumably caused by the module.h split interacting with a new commit `dfb09f9b7a` ("x86, amd: Avoid cache aliasing penalties on AMD family 15h") from the x8 tree. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/20110928174214.17a58be15d84d67c185930e1@canb.auug.org.au Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-28 10:34:31 +02:00
Randy Dunlap	d6eed550a9	x86: Perf_event_amd.c needs <asm/apicdef.h> Fix (rare) build error by adding <asm/apicdef.h> header file: arch/x86/kernel/cpu/perf_event_amd.c:350:2: error: 'BAD_APICID' undeclared (first use in this function) Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Robert Richter <robert.richter@amd.com> Cc: Andre Przywara <andre.przywara@amd.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/4E820138.90301@xenotime.net Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-27 19:55:09 +02:00
Paul Bolle	395cf9691d	doc: fix broken references There are numerous broken references to Documentation files (in other Documentation files, in comments, etc.). These broken references are caused by typo's in the references, and by renames or removals of the Documentation files. Some broken references are simply odd. Fix these broken references, sometimes by dropping the irrelevant text they were part of. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-09-27 18:08:04 +02:00
Kevin Winchester	de0428a7ad	x86, perf: Clean up perf_event cpu code The CPU support for perf events on x86 was implemented via included C files with #ifdefs. Clean this up by creating a new header file and compiling the vendor-specific files as needed. Signed-off-by: Kevin Winchester <kjwinchester@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1314747665-2090-1-git-send-email-kjwinchester@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-26 12:58:00 +02:00

... 3 4 5 6 7 ...

8189 Commits