Commit Graph

9429 Commits

Author SHA1 Message Date
Thomas Gleixner
6dbfe5a57d x86: Fixup last users of irq_chip->typename
The typename member of struct irq_chip was kept for migration purposes
and is obsolete since more than 2 years. Fix up the leftovers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-18 11:45:29 +01:00
Rusty Russell
8dca15e408 [CPUFREQ] speedstep-ich: fix error caused by 394122ab14
"[CPUFREQ] cpumask: avoid playing with cpus_allowed in speedstep-ich.c"
changed the code to mistakenly pass the current cpu as the "processor"
argument of speedstep_get_frequency(), whereas it should be the type of
the processor.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14340

Based on a patch by Dave Mueller.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Dominik Brodowski <linux@brodo.de>
Reported-by: Dave Mueller <dave.mueller@gmx.ch>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-11-17 23:15:04 -05:00
John Villalovos
293afe44d7 [CPUFREQ] acpi-cpufreq: blacklist Intel 0f68: Fix HT detection and put in notification message
Removing the SMT/HT check, since the Errata doesn't mention
Hyper-Threading.

Adding in a printk, so that the user knows why acpi-cpufreq refuses to
load.  Also, once system is blacklisted, don't repeat checks to see if
blacklisted.  This also causes the message to only be printed once,
rather than for each CPU.

Signed-off-by: John L. Villalovos <john.l.villalovos@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-11-17 23:15:03 -05:00
Roel Kluin
c53614ec17 [CPUFREQ] powernow-k8: Fix test in get_transition_latency()
Not makes it a bool before the comparison.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-11-17 23:15:03 -05:00
Krzysztof Helt
f7f3cad060 [CPUFREQ] longhaul: select Longhaul version 2 for capable CPUs
There is a typo in the longhaul detection code so only Longhaul v1 or Longhaul v3
is selected. The Longhaul v2 is not selected even for CPUs which are capable of.

Tested on PCChips Giga Pro board. Frequency changes work and the Longhaul v2
detects that the board is not capable of changing CPU voltage.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
2009-11-17 23:15:03 -05:00
Yinghai Lu
508d85c2c6 x86: When cleaning MTRRs, do not fold WP into UC
The current MTRR code treats WP as a form of UC.  This really isn't
desirable behaviour, except possibly in the case of severe MTRR
shortage.  Disable this, to allow legitimate uses of WP to remain
unmolested.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-17 10:26:53 -08:00
Lin Ming
0696b711e4 timekeeping: Fix clock_gettime vsyscall time warp
Since commit 0a544198 "timekeeping: Move NTP adjusted clock multiplier
to struct timekeeper" the clock multiplier of vsyscall is updated with
the unmodified clock multiplier of the clock source and not with the
NTP adjusted multiplier of the timekeeper.

This causes user space observerable time warps:
new CLOCK-warp maximum: 120 nsecs,  00000025c337c537 -> 00000025c337c4bf

Add a new argument "mult" to update_vsyscall() and hand in the
timekeeping internal NTP adjusted multiplier.

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tony Luck <tony.luck@intel.com>
LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-17 11:52:34 +01:00
Ingo Molnar
a7b63425a4 Merge branch 'perf/core' into perf/probes
Resolved merge conflict in tools/perf/Makefile

Merge reason: we want to queue up a dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-17 10:17:47 +01:00
Andreas Herrmann
8cc2361bd0 x86: ucode-amd: Move family check to microcde_amd.c's init function
... to avoid useless trial to load firmware on systems with
unsupported AMD CPUs.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Mohr <andi@lisas.de>
Cc: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20091117070638.GA27691@alberich.amd.com>
[ v2: changed BUG_ON() to WARN_ON() ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-17 10:10:40 +01:00
Eric W. Biederman
bb9074ff58 Merge commit 'v2.6.32-rc7'
Resolve the conflict between v2.6.32-rc7 where dn_def_dev_handler
gets a small bug fix and the sysctl tree where I am removing all
sysctl strategy routines.
2009-11-17 01:01:34 -08:00
Ingo Molnar
123bf0e2ed x86: gart: Clean up the code a bit
Clean up various small stylistic details in the GART code. No
functionality changed.

Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: muli@il.ibm.com
Cc: joerg.roedel@amd.com
LKML-Reference: <1258287594-8777-2-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-17 07:57:00 +01:00
FUJITA Tomonori
1f7564ca83 x86: Calgary: Remove unnecessary DMA_ERROR_CODE usage
This cleans up iommu_alloc() a bit and removes unnecessary
DMA_ERROR_CODE usage.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: muli@il.ibm.com
Cc: joerg.roedel@amd.com
LKML-Reference: <1258287594-8777-4-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-17 07:53:21 +01:00
FUJITA Tomonori
8fd524b355 x86: Kill bad_dma_address variable
This kills bad_dma_address variable, the old mechanism to enable
IOMMU drivers to make dma_mapping_error() work in IOMMU's
specific way.

bad_dma_address variable was introduced to enable IOMMU drivers
to make dma_mapping_error() work in IOMMU's specific way.
However, it can't handle systems that use both swiotlb and HW
IOMMU. SO we introduced dma_map_ops->mapping_error to solve that
case.

Intel VT-d, GART, and swiotlb already use
dma_map_ops->mapping_error. Calgary, AMD IOMMU, and nommu use
zero for an error dma address. This adds DMA_ERROR_CODE and
converts them to use it (as SPARC and POWER does).

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: muli@il.ibm.com
Cc: joerg.roedel@amd.com
LKML-Reference: <1258287594-8777-3-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-17 07:53:21 +01:00
FUJITA Tomonori
42109197eb x86: gart: Add own dma_mapping_error function
GART IOMMU is the only user of bad_dma_address variable.

This patch converts GART to use the newer mechanism, fill in
->mapping_error() in struct dma_map_ops, to make
dma_mapping_error() work in IOMMU specific way.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: muli@il.ibm.com
Cc: joerg.roedel@amd.com
LKML-Reference: <1258287594-8777-2-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-17 07:53:20 +01:00
Ingo Molnar
99f4c9de2b Merge commit 'v2.6.32-rc7' into core/iommu
Merge reason: Add fixes we'll depend on.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-17 07:51:07 +01:00
Masami Hiramatsu
35039eb6b1 x86: Show symbol name if insn decoder test failed
Show symbol name if insn decoder test find a difference.
This will help us to find out where the issue is.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20091116230624.5250.49813.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-17 07:16:50 +01:00
Masami Hiramatsu
d65ff75fbe x86: Add verbose option to insn decoder test
Add verbose option to insn decoder test. This dumps decoded
instruction when building kernel with V=1.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20091116230618.5250.18762.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-17 07:16:48 +01:00
H. Peter Anvin
5bd085b5fb x86: remove "extern" from function prototypes in <asm/proto.h>
Function prototypes don't need "extern", and it is generally frowned
upon to have them.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-11-16 13:55:31 -08:00
Kees Cook
4b0f3b81eb x86, mm: Report state of NX protections during boot
It is possible for x86_64 systems to lack the NX bit either due to the
hardware lacking support or the BIOS having turned off the CPU capability,
so NX status should be reported.  Additionally, anyone booting NX-capable
CPUs in 32bit mode without PAE will lack NX functionality, so this change
provides feedback for that case as well.

Signed-off-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1258154897-6770-6-git-send-email-hpa@zytor.com>
2009-11-16 13:44:59 -08:00
H. Peter Anvin
4763ed4d45 x86, mm: Clean up and simplify NX enablement
The 32- and 64-bit code used very different mechanisms for enabling
NX, but even the 32-bit code was enabling NX in head_32.S if it is
available.  Furthermore, we had a bewildering collection of tests for
the available of NX.

This patch:

a) merges the 32-bit set_nx() and the 64-bit check_efer() function
   into a single x86_configure_nx() function.  EFER control is left
   to the head code.

b) eliminates the nx_enabled variable entirely.  Things that need to
   test for NX enablement can verify __supported_pte_mask directly,
   and cpu_has_nx gives the supported status of NX.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Chris Wright <chrisw@sous-sol.org>
LKML-Reference: <1258154897-6770-5-git-send-email-hpa@zytor.com>
Acked-by: Kees Cook <kees.cook@canonical.com>
2009-11-16 13:44:59 -08:00
H. Peter Anvin
583140afb9 x86, pageattr: Make set_memory_(x|nx) aware of NX support
Make set_memory_x/set_memory_nx directly aware of if NX is supported
in the system or not, rather than requiring that every caller assesses
that support independently.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tim Starling <tstarling@wikimedia.org>
Cc: Hannes Eder <hannes@hanneseder.net>
LKML-Reference: <1258154897-6770-4-git-send-email-hpa@zytor.com>
Acked-by: Kees Cook <kees.cook@canonical.com>
2009-11-16 13:44:58 -08:00
H. Peter Anvin
a7c4c0d934 x86, sleep: Always save the value of EFER
Always save the value of EFER, regardless of the state of NX.  Since
EFER may not actually exist, use rdmsr_safe() to do so.

v2: check the return value from rdmsr_safe() instead of relying on
    the output values being unchanged on error.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@tuxonice.net>
LKML-Reference: <1258154897-6770-3-git-send-email-hpa@zytor.com>
Acked-by: Kees Cook <kees.cook@canonical.com>
2009-11-16 13:44:57 -08:00
H. Peter Anvin
8a50e5135a x86-32: Use symbolic constants, safer CPUID when enabling EFER.NX
Use symbolic constants rather than hard-coded values when setting
EFER.NX in head_32.S, and do a more rigorous test for the validity of
the response when probing for the extended CPUID range.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1258154897-6770-2-git-send-email-hpa@zytor.com>
Acked-by: Kees Cook <kees.cook@canonical.com>
2009-11-16 13:44:56 -08:00
Cyrill Gorcunov
e79c65a97c x86: io-apic: IO-APIC MMIO should not fail on resource insertion
If IO-APIC base address is 1K aligned we should not fail
on resourse insertion procedure. For this sake we define
IO_APIC_SLOT_SIZE constant which should cover all IO-APIC
direct accessible registers.

An example of a such configuration is there

	http://marc.info/?l=linux-kernel&m=118114792006520

 |
 | Quoting the message
 |
 | IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
 | IOAPIC[1]: apic_id 3, version 32, address 0xfec80000, GSI 24-47
 | IOAPIC[2]: apic_id 4, version 32, address 0xfec80400, GSI 48-71
 | IOAPIC[3]: apic_id 5, version 32, address 0xfec84000, GSI 72-95
 | IOAPIC[4]: apic_id 8, version 32, address 0xfec84400, GSI 96-119
 |

Reported-by: "Maciej W. Rozycki" <macro@linux-mips.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20091116151426.GC5653@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-16 16:37:10 +01:00
Frederic Weisbecker
3c93ca00ee x86: Add missing might_fault() checks to copy_{to,from}_user()
On x86-64, copy_[to|from]_user() rely on assembly routines that
never call might_fault(), making us missing various lockdep
checks.

This doesn't apply to __copy_from,to_user() that explicitly
handle these calls, neither is it a problem in x86-32 where
copy_to,from_user() rely on the "__" prefixed versions that
also call might_fault().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1258382538-30979-1-git-send-email-fweisbec@gmail.com>
[ v2: fix module export ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-16 16:09:52 +01:00
Prarit Bhargava
303fc0870f x86: AMD Northbridge: Verify NB's node is online
Fix panic seen on some IBM and HP systems on 2.6.32-rc6:

 BUG: unable to handle kernel NULL pointer dereference at (null)
 IP: [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
  [...]
  [<ffffffff8120bbde>] cpumask_next_and+0x2e/0x3b
  [<ffffffff81225c62>] pci_device_probe+0x8e/0xf5
  [<ffffffff812b9be6>] ? driver_sysfs_add+0x47/0x6c
  [<ffffffff812b9da5>] driver_probe_device+0xd9/0x1f9
  [<ffffffff812b9f1d>] __driver_attach+0x58/0x7c
  [<ffffffff812b9ec5>] ? __driver_attach+0x0/0x7c
  [<ffffffff812b9298>] bus_for_each_dev+0x54/0x89
  [<ffffffff812b9b4f>] driver_attach+0x19/0x1b
  [<ffffffff812b97ae>] bus_add_driver+0xd3/0x23d
  [<ffffffff812ba1e7>] driver_register+0x98/0x109
  [<ffffffff81225ed0>] __pci_register_driver+0x63/0xd3
  [<ffffffff81072776>] ? up_read+0x26/0x2a
  [<ffffffffa0081000>] ? k8temp_init+0x0/0x20 [k8temp]
  [<ffffffffa008101e>] k8temp_init+0x1e/0x20 [k8temp]
  [<ffffffff8100a073>] do_one_initcall+0x6d/0x185
  [<ffffffff8108d765>] sys_init_module+0xd3/0x236
  [<ffffffff81011ac2>] system_call_fastpath+0x16/0x1b

I put in a printk and commented out the set_dev_node()
call when and got this output:

 quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x0
 quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x1
 quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x2
 quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x3

I.e. the issue appears to be that the HW has set val to a valid
value, however, the system is only configured for a single
node -- 0, the others are offline.

Check to see if the node is actually online before setting
the numa node for an AMD northbridge in quirk_amd_nb_node().

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: bhavna.sarathy@amd.com
Cc: jbarnes@virtuousgeek.org
Cc: andreas.herrmann3@amd.com
LKML-Reference: <20091112180933.12532.98685.sendpatchset@prarit.bos.redhat.com>
[ v2: clean up the code and add comments ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-16 15:43:05 +01:00
Thomas Gleixner
411462f62a x86: Fix printk format due to variable type change
clockevents.mult became u32. Fix the printk format.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-16 11:55:20 +01:00
Hiroshi Shimamoto
62ad33f670 x86: Don't put iommu_shutdown_noop() in init section
It causes kernel panic on shutdown or reboot.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <4B00BC8E.50801@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-16 08:58:51 +01:00
Ingo Molnar
39dc78b651 Merge commit 'v2.6.32-rc7' into perf/core
Merge reason: pick up perf fixlets

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-15 09:50:41 +01:00
Jan Beulich
1472248583 x86-64: __copy_from_user_inatomic() adjustments
This v2.6.26 commit:

    ad2fc2c: x86: fix copy_user on x86

rendered __copy_from_user_inatomic() identical to
copy_user_generic(), yet didn't make the former just call the
latter from an inline function.

Furthermore, this v2.6.19 commit:

    b885808: [PATCH] Add proper sparse __user casts to __copy_to_user_inatomic

converted the return type of __copy_to_user_inatomic() from
unsigned long to int, but didn't do the same to
__copy_from_user_inatomic().

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: <v.mayatskih@gmail.com>
LKML-Reference: <4AFD5778020000780001F8F4@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-15 09:29:47 +01:00
FUJITA Tomonori
f4131c6259 x86: Make calgary_iommu_init() static
This makes calgary_iommu_init() static and moves it to remove
the forward declaration.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: muli@il.ibm.com
LKML-Reference: <20091114212603U.fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-15 09:04:14 +01:00
FUJITA Tomonori
6959450e56 swiotlb: Remove duplicate swiotlb_force extern declarations
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: tony.luck@intel.com
LKML-Reference: <1258199198-16657-4-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-15 09:03:10 +01:00
FUJITA Tomonori
94a15564ac x86: Move iommu_shutdown_noop to x86_init.c
iommu_init_noop() is in arch/x86/kernel/x86_init.c but
iommu_shutdown_noop() in arch/x86/include/asm/iommu.h.

This moves iommu_shutdown_noop() to x86_init.c for consistency.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <1258199198-16657-3-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-15 09:03:10 +01:00
FUJITA Tomonori
a3b28ee109 x86: Set dma_ops to nommu_dma_ops by default
We set dma_ops to nommu_dma_ops at two different places for
x86_32 and x86_64. This unifies them by setting dma_ops to
nommu_dma_ops by default.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <1258199198-16657-2-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-15 09:03:09 +01:00
Ingo Molnar
68efa37df7 hw-breakpoints, x86: Fix modular KVM build
This build error:

arch/x86/kvm/x86.c:3655: error: implicit declaration of function 'hw_breakpoint_restore'

Happens because in the CONFIG_KVM=m case there's no 'CONFIG_KVM' define
in the kernel - it's CONFIG_KVM_MODULE in that case.

Make the prototype available unconditionally.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
LKML-Reference: <1258114575-32655-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-14 15:32:53 +01:00
Ingo Molnar
31c997cac7 x86: Fix cpu_devs[] initialization in early_cpu_init()
Yinghai Lu noticed that this commit:

  0388423: x86: Minimise printk spew from per-vendor init code

mistakenly left out the initialization of cpu_devs[] in the
!PROCESSOR_SELECT case. Fix it.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Cc: Dave Jones <davej@redhat.com>
LKML-Reference: <20091113203000.GA19160@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-14 10:36:50 +01:00
Roland Dreier
b01c845f0f x86: Remove CPU cache size output for non-Intel too
As Dave Jones said about the output in intel_cacheinfo.c: "They
aren't useful, and pollute the dmesg output a lot (especially on
machines with many cores).  Also the same information can be
trivially found out from userspace."

Give the generic display_cacheinfo() function the same treatment.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Dave Jones <davej@redhat.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <adaocn6dp99.fsf_-_@roland-alpha.cisco.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-14 01:51:18 +01:00
Dave Jones
0388423dba x86: Minimise printk spew from per-vendor init code
In the default case where the kernel supports all CPU vendors,
we currently print out a bunch of not useful messages on every
system.

32-bit:
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  NSC Geode by NSC
  Cyrix CyrixInstead
  Centaur CentaurHauls
  Transmeta GenuineTMx86
  Transmeta TransmetaCPU
  UMC UMC UMC UMC

64-bit:
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls

Given that "what CPUs does the kernel support" isn't useful for
the "support everything" case, we can suppress these printk's.

Signed-off-by: Dave Jones <davej@redhat.com>
LKML-Reference: <20091113203000.GA19160@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-14 01:18:05 +01:00
Matthew Garrett
d9b263528e x86, setup: Store the boot cursor state
Add a field to store the boot cursor state and implement this for VGA on
x86. This can then be used to set the default policy for the boot console.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
LKML-Reference: <1258142222-16092-1-git-send-email-mjg@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-11-13 14:23:11 -08:00
Dave Jones
15cd8812ab x86: Remove the CPU cache size printk's
They aren't really useful, and they pollute the dmesg output a lot
(especially on machines with many cores).

Also the same information can be trivially found out from
userspace.

Reported-by: Mike Travis <travis@sgi.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091112231542.GA7129@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-13 09:14:55 +01:00
Eric W. Biederman
24a065624d sysctl x86: Remove dead binary sysctl support
Now that sys_sysctl is a generic wrapper around /proc/sys  .ctl_name
and .strategy members of sysctl tables are dead code.  Remove them.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2009-11-12 02:05:04 -08:00
Hiroshi Shimamoto
db48cccc7c perf_event, x86: Annotate init functions and data
Annotate init functions and data with __init and __initconst.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@gmail.com>
LKML-Reference: <4AFB721E.8070203@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-12 09:18:36 +01:00
Hidetoshi Seto
cffd377e58 x86, mce: Fix __init annotations
The intel_init_thermal() is called from resume path, so it
cannot be marked as __init.

OTOH mce_banks_init() is only called from
__mcheck_cpu_cap_init() which is marked as __cpuinit, so it can
be also marked as __cpuinit.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Yong Wang <yong.y.wang@linux.intel.com>
LKML-Reference: <4AFBB0B8.2070501@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-12 09:17:11 +01:00
Linus Torvalds
55871bdd03 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  x86/PCI: Adjust GFP mask handling for coherent allocations
  PCI ASPM: fix oops on root port removal
2009-11-11 11:34:14 -08:00
Linus Torvalds
605f37504f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, amd-ucode: Check UCODE_MAGIC before loading the container file
  x86: Fix error return sequence in __ioremap_caller()
  x86: Add Phoenix/MSC BIOSes to lowmem corruption list
2009-11-11 11:29:10 -08:00
Yinghai Lu
196cf0d67a x86: Make sure wakeup trampoline code is below 1MB
Instead of using bootmem, try find_e820_area()/reserve_early(),
and call acpi_reserve_memory() early, to allocate the wakeup
trampoline code area below 1M.

This is more reliable, and it also removes a dependency on
bootmem.

-v2: change function name to acpi_reserve_wakeup_memory(),
     as suggested by Rafael.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: pm list <linux-pm@lists.linux-foundation.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4AFA210B.3020207@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-11 20:14:32 +01:00
Andreas Herrmann
9f15226e75 x86, ucode-amd: Ensure ucode update on suspend/resume after CPU off/online cycle
When switching a CPU offline/online and then doing
suspend/resume, ucode is not updated on this CPU.

This is due to the microcode_fini_cpu() call which frees uci->mc
when setting the CPU offline:

  static void microcode_fini_cpu_amd(int cpu)
  {
          struct ucode_cpu_info *uci = ucode_cpu_info + cpu;

          vfree(uci->mc);
          uci->mc = NULL;
  }

When the CPU is set online uci->mc is still NULL because no
ucode update is required.

Finally this prevents ucode update when resuming after suspend:

  static enum ucode_state microcode_resume_cpu(int cpu)
  {
        struct ucode_cpu_info *uci = ucode_cpu_info + cpu;

        if (!uci->mc)
                return UCODE_NFOUND;

        ...
  }

Fix is to check whether uci->mc is valid before
microcode_resume_cpu() is called.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: dimm <dmitry.adamushko@gmail.com>
LKML-Reference: <20091111190329.GF18592@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-11 20:06:54 +01:00
FUJITA Tomonori
b18485e7ac swiotlb: Remove the swiotlb variable usage
POWERPC doesn't expect it to be used.

This fixes the linux-next build failure reported by
Stephen Rothwell:

  lib/swiotlb.c: In function 'setup_io_tlb_npages':
  lib/swiotlb.c:114: error: 'swiotlb' undeclared (first use in this function)

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: peterz@infradead.org
LKML-Reference: <20091112000258F.fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-11 16:51:18 +01:00
Yong Wang
ce6b5d768c x86: Mark the thermal init functions __init
Mark the thermal init functions __init so that the init memory
can be freed.

Signed-off-by: Yong Wang <yong.y.wang@intel.com>
LKML-Reference: <20091111075125.GA17900@ywang-moblin2.bj.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-11 12:33:32 +01:00
Randy Dunlap
e84446de5c x86 VSDO: Fix Kconfig help
COMPAT_VDSO has 2 help text blocks, but kconfig only uses the
last one found, so merge the 2 blocks.

It would be real nice if kconfig would warn about this.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <4AF9FB6C.70003@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-11 07:26:41 +01:00
Dimitri Sivanich
200a9ae280 x86: Remove asm/apicnum.h
arch/x86/include/asm/apicnum.h is not referenced anywhere
anymore. Its definitions appear in apicdef.h. Remove it.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Mike Travis <travis@sgi.com>
LKML-Reference: <20091110195835.GA4393@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 22:07:35 +01:00
Dave Jones
e02e0e1a13 x86: Fix typo in Intel CPU cache size descriptor
I double-checked the datasheet. One of the existing
descriptors has a typo: it should be 2MB not 2038 KB.

Signed-off-by: Dave Jones <davej@redhat.com>
Cc: <stable@kernel.org> # .3x.x: 85160b9: x86: Add new Intel CPU cache size descriptors
Cc: <stable@kernel.org> # .3x.x
LKML-Reference: <20091110200120.GA27090@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 21:52:32 +01:00
Dave Jones
85160b92fb x86: Add new Intel CPU cache size descriptors
The latest rev of Intel doc AP-485 details new cache descriptors
that we don't yet support. 12MB, 18MB and 24MB 24-way assoc L3
caches.

Signed-off-by: Dave Jones <davej@redhat.com>
LKML-Reference: <20091110184924.GA20337@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 20:06:16 +01:00
Ingo Molnar
b4941a9a60 x86: Add iommu_init to x86_init_ops, fix build
Most of the time x86_init.h is included in pci-dma.c - but not always,
leading to this rare build failure:

arch/x86/kernel/pci-dma.c:296: error: 'x86_init' undeclared (first use in this function)

So include asm/x86_init.h explicitly.

Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-2-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 14:37:58 +01:00
FUJITA Tomonori
72d03802b8 x86, 32-bit: Fix swiotlb boot crash
Ingo Molnar reported this boot crash:

[    8.655620] pata_amd 0000:00:06.0: version 0.4.1
[    8.660286] BUG: unable to handle kernel NULL pointer dereference at 00000034
[    8.663572] IP: [<c100617b>] dma_supported+0x3b/0xa4
[    8.663572] *pde = 00000000

Initialize dma_ops properly in the 32-bit case.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 14:11:32 +01:00
FUJITA Tomonori
75f1cdf1dd x86: Handle HW IOMMU initialization failure gracefully
If HW IOMMU initialization fails (Intel VT-d often does this,
typically due to BIOS bugs), we fall back to nommu. It doesn't
work for the majority since nowadays we have more than 4GB
memory so we must use swiotlb instead of nommu.

The problem is that it's too late to initialize swiotlb when HW
IOMMU initialization fails. We need to allocate swiotlb memory
earlier from bootmem allocator. Chris explained the issue in
detail:

  http://marc.info/?l=linux-kernel&m=125657444317079&w=2

The current x86 IOMMU initialization sequence is too complicated
and handling the above issue makes it more hacky.

This patch changes x86 IOMMU initialization sequence to handle
the above issue cleanly.

The new x86 IOMMU initialization sequence are:

1. we initialize the swiotlb (and setting swiotlb to 1) in the case
   of (max_pfn > MAX_DMA32_PFN && !no_iommu). dma_ops is set to
   swiotlb_dma_ops or nommu_dma_ops. if swiotlb usage is forced by
   the boot option, we finish here.

2. we call the detection functions of all the IOMMUs

3. the detection function sets x86_init.iommu.iommu_init to the
   IOMMU initialization function (so we can avoid calling the
   initialization functions of all the IOMMUs needlessly).

4. if the IOMMU initialization function doesn't need to swiotlb
   then sets swiotlb to zero (e.g. the initialization is
   sucessful).

5. if we find that swiotlb is set to zero, we free swiotlb
   resource.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-10-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 12:32:07 +01:00
FUJITA Tomonori
ad32e8cb86 swiotlb: Defer swiotlb init printing, export swiotlb_print_info()
This enables us to avoid printing swiotlb memory info when we
initialize swiotlb. After swiotlb initialization, we could find
that we don't need swiotlb.

This patch removes the code to print swiotlb memory info in
swiotlb_init() and exports the function to do that.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
Cc: tony.luck@intel.com
Cc: benh@kernel.crashing.org
LKML-Reference: <1257849980-22640-9-git-send-email-fujita.tomonori@lab.ntt.co.jp>
[ -v2: merge up conflict ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 12:32:00 +01:00
FUJITA Tomonori
9d5ce73a64 x86: intel-iommu: Convert detect_intel_iommu to use iommu_init hook
This changes detect_intel_iommu() to set intel_iommu_init() to
iommu_init hook if detect_intel_iommu() finds the IOMMU.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-6-git-send-email-fujita.tomonori@lab.ntt.co.jp>
[ -v2: build fix for the !CONFIG_DMAR case ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 12:31:36 +01:00
FUJITA Tomonori
ea1b0d3945 x86: amd_iommu: Convert amd_iommu_detect() to use iommu_init hook
This changes amd_iommu_detect() to set amd_iommu_init to
iommu_init hook if amd_iommu_detect() finds the AMD IOMMU.

We can kill the code to check if we found the IOMMU in
amd_iommu_init() since amd_iommu_detect() sets amd_iommu_init()
only when it found the IOMMU.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-5-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 12:31:30 +01:00
FUJITA Tomonori
de957628ce x86: GART: Convert gart_iommu_hole_init() to use iommu_init hook
This changes gart_iommu_hole_init() to set gart_iommu_init() to
iommu_init hook if gart_iommu_hole_init() finds the GART IOMMU.

We can kill the code to check if we found the IOMMU in
gart_iommu_init() since gart_iommu_hole_init() sets
gart_iommu_init() only when it found the IOMMU.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-4-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 12:31:23 +01:00
FUJITA Tomonori
d7b9f7be21 x86: Calgary: Convert detect_calgary() to use iommu_init hook
This changes detect_calgary() to set init_calgary() to
iommu_init hook if detect_calgary() finds the Calgary IOMMU.

We can kill the code to check if we found the IOMMU in
init_calgary() since detect_calgary() sets init_calgary() only
when it found the IOMMU.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
LKML-Reference: <1257849980-22640-3-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 12:31:15 +01:00
FUJITA Tomonori
d07c1be069 x86: Add iommu_init to x86_init_ops
We call the detections functions of all the IOMMUs then all
their initialization functions. The latter is pointless since we
don't detect multiple different IOMMUs. What we need to do is
calling the initialization function of the detected IOMMU.

This adds iommu_init hook to x86_init_ops so if an IOMMU
detection function can set its initialization function to the
hook.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-2-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 12:31:07 +01:00
Andreas Herrmann
1a74357066 x86: ucode-amd: Convert printk(KERN_*...) to pr_*(...)
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: dimm <dmitry.adamushko@gmail.com>
LKML-Reference: <20091110110920.GJ30802@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 12:15:50 +01:00
Andreas Herrmann
14c569425a x86: ucode-amd: Don't warn when no ucode is available for a CPU revision
There is no point in warning when there is no ucode available
for a specific CPU revision. Currently the container-file, which
provides the AMD ucode patches for OS load, contains only a few
ucode patches.

It's already clearly indicated by the printed patch_level
whenever new ucode was available and an update happened. So the
warning message is of no help but rather annoying on systems
with many CPUs.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: dimm <dmitry.adamushko@gmail.com>
LKML-Reference: <20091110110825.GI30802@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 12:15:49 +01:00
Andreas Herrmann
d1c84f79a6 x86: ucode-amd: Load ucode-patches once and not separately of each CPU
This also implies that corresponding log messages, e.g.

  platform microcode: firmware: requesting amd-ucode/microcode_amd.bin

show up only once on module load and not when ucode is updated
for each CPU.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: dimm <dmitry.adamushko@gmail.com>
LKML-Reference: <20091110110723.GH30802@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 12:15:48 +01:00
Frederic Weisbecker
59d8eb53ea hw-breakpoints: Wrap in the KVM breakpoint active state check
Wrap in the cpu dr7 check that tells if we have active
breakpoints that need to be restored in the cpu.

This wrapper makes the check more self-explainable and also
reusable for any further other uses.

Reported-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
2009-11-10 11:23:43 +01:00
Frederic Weisbecker
9f6b3c2c30 hw-breakpoints: Fix broken a.out format dump
Fix the broken a.out format dump. For now we only dump the ptrace
breakpoints.

TODO: Dump every perf breakpoints for the current thread, not only
ptrace based ones.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
2009-11-10 11:23:05 +01:00
Xiaotian Feng
2fb8f4e6a8 x86: pat: Remove ioremap_default()
Commit:

  b6ff32d: x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype()

consolidated reserve_memtype() and pat_x_mtrr_type,
this made ioremap_default() same as ioremap_cache().

Remove the redundant function and change the only caller to use
ioremap_cache.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
LKML-Reference: <1257845005-7938-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 10:34:05 +01:00
Xiaotian Feng
83ea05ea69 x86: pat: Clean up req_type special case for reserve_memtype()
Commit:

  b6ff32d: x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype()

consolidated code in pat_x_mtrr_type() and reserve_memtype(),
which removed the special case (req_type is -1) for the
PAT-enabled part.

We should also change comments and the PAT-disabled part.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
LKML-Reference: <1257844987-7906-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 10:34:04 +01:00
Joe Perches
41855b7754 x86: GART: pci-gart_64.c: Use correct length in strncmp
Signed-off-by: Joe Perches <joe@perches.com>
Cc: <stable@kernel.org> # .3x.x
LKML-Reference: <1257818330.12852.72.camel@Joe-Laptop.home>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 06:05:39 +01:00
Yong Wang
a2202aa292 x86: Under BIOS control, restore AP's APIC_LVTTHMR to the BSP value
On platforms where the BIOS handles the thermal monitor interrupt,
APIC_LVTTHMR on each logical CPU is programmed to generate a SMI
and OS must not touch it.

Unfortunately AP bringup sequence using INIT-SIPI-SIPI clears all
the LVT entries except the mask bit. Essentially this results in
all LVT entries including the thermal monitoring interrupt set
to masked (clearing the bios programmed value for APIC_LVTTHMR).

And this leads to kernel take over the thermal monitoring
interrupt on AP's but not on BSP (leaving the bios programmed
value only on BSP).

As a result of this, we have seen system hangs when the thermal
monitoring interrupt is generated.

Fix this by reading the initial value of thermal LVT entry on
BSP and if bios has taken over the control, then program the
same value on all AP's and leave the thermal monitoring
interrupt control on all the logical cpu's to the bios.

Signed-off-by: Yong Wang <yong.y.wang@intel.com>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Arjan van de Ven <arjan@infradead.org>
LKML-Reference: <20091110013824.GA24940@ywang-moblin2.bj.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org
2009-11-10 05:57:55 +01:00
Cyrill Gorcunov
7abc075313 x86: apic: Do not use stacked physid_mask_t
We should not use physid_mask_t as a stack based
variable in apic code. This type depends on MAX_APICS
parameter which may be huge enough.

Especially it became a problem with apic NOOP driver which
is portable between 32 bit and 64 bit environment
(where we have really huge MAX_APICS).

So apic driver should operate with pointers and a caller
in turn should aware of allocation physid_mask_t variable.

As a side (but positive) effect -- we may use already
implemented physid_set_mask_of_physid function eliminating
default_apicid_to_cpu_present completely.

Note that physids_coerce and physids_promote turned into static
inline from macro (since macro hides the fact that parameter is
being interpreted as unsigned long, make it explicit).

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20091109220659.GA5568@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 05:52:07 +01:00
Andreas Herrmann
6e18da75c2 x86, amd-ucode: Remove needless log messages
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20091029134742.GD30802@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 05:49:28 +01:00
Borislav Petkov
506f90eeae x86, amd-ucode: Check UCODE_MAGIC before loading the container file
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20091029134552.GC30802@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10 05:46:09 +01:00
Huang Ying
fd650a6394 x86: Generate .byte code for some new instructions via gas macro
It will take some time for binutils (gas) to support some newly added
instructions, such as SSE4.1 instructions or the AES-NI instructions
found in upcoming Intel CPU.

To make the source code can be compiled by old binutils, .byte code is
used instead of the assembly instruction. But the readability and
flexibility of raw .byte code is not good.

This patch solves the issue of raw .byte code via generating it via
assembly instruction like gas macro. The syntax is as close as
possible to real assembly instruction.

Some helper macros such as MODRM is not a full feature
implementation. It can be extended when necessary.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2009-11-09 13:52:26 -05:00
Cyrill Gorcunov
f4a70c5537 x86, apic: Get rid of apicid_to_cpu_present assign on 64-bit
In fact it's never get used on x86-64 (for 64 bit platform
we use differ technique to enumerate io-units).

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20091108131645.GD5300@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 19:46:17 +01:00
Cyrill Gorcunov
4343fe1024 x86, ioapic: Use snrpintf while set names for IO-APIC resourses
We should be ready that one day MAX_IO_APICS may raise its
number. To prevent memory overwrite we're to use safe
snprintf while set IO-APIC resourse name.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20091108155431.GC25940@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 17:06:23 +01:00
Cyrill Gorcunov
46dc281b1b x86, apic: Use PAGE_SIZE instead of numbers
The whole page is reserved for IO-APIC fixmap
due to non-cacheable requirement. So lets note
this explicitly instead of playing with numbers.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
LKML-Reference: <20091108155356.GB25940@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 17:06:22 +01:00
Jan Beulich
eb647138ac x86/PCI: Adjust GFP mask handling for coherent allocations
Rather than forcing GFP flags and DMA mask to be inconsistent,
GFP flags should be determined even for the fallback device
through dma_alloc_coherent_mask()/dma_alloc_coherent_gfp_flags().

This restores 64-bit behavior as it was prior to commits
8965eb1938 and
4a367f3a9d (not sure why there are
two of them), where GFP_DMA was forced on for 32-bit, but not
for 64-bit, with the slight adjustment that afaict even 32-bit
doesn't need this without CONFIG_ISA.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
LKML-Reference: <4AF18187020000780001D8AA@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-08 07:44:30 -08:00
Frederic Weisbecker
24f1e32c60 hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events
This patch rebase the implementation of the breakpoints API on top of
perf events instances.

Each breakpoints are now perf events that handle the
register scheduling, thread/cpu attachment, etc..

The new layering is now made as follows:

       ptrace       kgdb      ftrace   perf syscall
          \          |          /         /
           \         |         /         /
                                        /
            Core breakpoint API        /
                                      /
                     |               /
                     |              /

              Breakpoints perf events

                     |
                     |

               Breakpoints PMU ---- Debug Register constraints handling
                                    (Part of core breakpoint API)
                     |
                     |

             Hardware debug registers

Reasons of this rewrite:

- Use the centralized/optimized pmu registers scheduling,
  implying an easier arch integration
- More powerful register handling: perf attributes (pinned/flexible
  events, exclusive/non-exclusive, tunable period, etc...)

Impact:

- New perf ABI: the hardware breakpoints counters
- Ptrace breakpoints setting remains tricky and still needs some per
  thread breakpoints references.

Todo (in the order):

- Support breakpoints perf counter events for perf tools (ie: implement
  perf_bpcounter_event())
- Support from perf tools

Changes in v2:

- Follow the perf "event " rename
- The ptrace regression have been fixed (ptrace breakpoint perf events
  weren't released when a task ended)
- Drop the struct hw_breakpoint and store generic fields in
  perf_event_attr.
- Separate core and arch specific headers, drop
  asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
- Use new generic len/type for breakpoint
- Handle off case: when breakpoints api is not supported by an arch

Changes in v3:

- Fix broken CONFIG_KVM, we need to propagate the breakpoint api
  changes to kvm when we exit the guest and restore the bp registers
  to the host.

Changes in v4:

- Drop the hw_breakpoint_restore() stub as it is only used by KVM
- EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
  module
- Restore the breakpoints unconditionally on kvm guest exit:
  TIF_DEBUG_THREAD doesn't anymore cover every cases of running
  breakpoints and vcpu->arch.switch_db_regs might not always be
  set when the guest used debug registers.
  (Waiting for a reliable optimization)

Changes in v5:

- Split-up the asm-generic/hw-breakpoint.h moving to
  linux/hw_breakpoint.h into a separate patch
- Optimize the breakpoints restoring while switching from kvm guest
  to host. We only want to restore the state if we have active
  breakpoints to the host, otherwise we don't care about messed-up
  address registers.
- Add asm/hw_breakpoint.h to Kbuild
- Fix bad breakpoint type in trace_selftest.c

Changes in v6:

- Fix wrong header inclusion in trace.h (triggered a build
  error with CONFIG_FTRACE_SELFTEST

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jan Kiszka <jan.kiszka@web.de>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
2009-11-08 15:34:42 +01:00
Tejun Heo
2ae8bb75db x86: Fix iommu=nodac parameter handling
iommu=nodac should forbid dac instead of enabling it. Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Matteo Frigo <athena@fftw.org>
Cc: <stable@kernel.org> # .32.x and older
LKML-Reference: <4AE5B52A.4050408@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 13:19:05 +01:00
FUJITA Tomonori
338bac527e x86: Use x86_platform for iommu_shutdown
This patch cleans up pci_iommu_shutdown() a bit to use
x86_platform (similar to how IA64 initializes an IOMMU driver).

This adds iommu_shutdown() to x86_platform to avoid calling
every IOMMUs' shutdown functions in pci_iommu_shutdown() in
order. The IOMMU shutdown functions are platform specific (we
don't have multiple different IOMMU hardware) so the current way
is pointless.

An IOMMU driver sets x86_platform.iommu_shutdown to the shutdown
function if necessary.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: joerg.roedel@amd.com
LKML-Reference: <20091027163358F.fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 13:12:26 +01:00
Randy Dunlap
0420101c07 x86: k8.h: Add struct bootnode
k8.h uses struct bootnode but does not #include a header file
for it, so provide a simple declaration for it.

  arch/x86/include/asm/k8.h:13: warning: 'struct bootnode'
  declared inside parameter list arch/x86/include/asm/k8.h:13:
  warning: its scope is only this definition or declaration, which is probably not what you want

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20091028160955.d27ccb16.randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 12:55:38 +01:00
Xiaotian Feng
de2a47cf2b x86: Fix error return sequence in __ioremap_caller()
kernel missed to free memtype if get_vm_area_caller failed in
__ioremap_caller.

This patch introduces error path to fix this and cleans up the
repetitive error return sequences that contributed to the
creation of the bug.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1257389031-20429-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 12:48:58 +01:00
Rusty Russell
0d0fbbddcc x86, msr, cpumask: Use struct cpumask rather than the deprecated cpumask_t
This makes the declarations match the definitions, which already
use 'struct cpumask'.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <200911052245.41803.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 11:58:38 +01:00
Masami Hiramatsu
c12a229bc5 x86: Remove unused thread_return label from switch_to()
Remove unused thread_return label from switch_to() macro on
x86-64. Since this symbol cuts into schedule(), backtrace at the
latter half of schedule() was always shown as thread_return().

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
LKML-Reference: <20091105160359.5181.26225.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08 11:57:13 +01:00
Simon Kagstrom
f1b291d4c4 x86: Add Phoenix/MSC BIOSes to lowmem corruption list
We have a board with a Phoenix/MSC BIOS which also corrupts the low
64KB of RAM, so add an entry to the table.

Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
LKML-Reference: <20091106154404.002648d9@marrow.netinsight.se>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-11-06 14:49:39 -08:00
Eric W. Biederman
c3359fbce4 sysctl: x86 Use the compat_sys_sysctl
Now that we have a generic 32bit compatibility implementation
there is no need for x86 to implement it's own.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2009-11-06 03:53:58 -08:00
Frederic Weisbecker
2da3e160cb hw-breakpoint: Move asm-generic/hw_breakpoint.h to linux/hw_breakpoint.h
We plan to make the breakpoints parameters generic among architectures.
For that it's better to move the asm-generic header to a generic linux
header.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-11-05 23:48:01 +01:00
Linus Torvalds
7c9abfb884 Merge branch 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: get_tss_base_addr() should return a gpa_t
  KVM: x86: Catch potential overrun in MCE setup
2009-11-05 13:24:15 -08:00
Chris Lalancette
2c75910f1a x86: Make sure get_user_desc() doesn't sign extend.
The current implementation of get_user_desc() sign extends the return
value because of integer promotion rules.  For the most part, this
doesn't matter, because the top bit of base2 is usually 0.  If, however,
that bit is 1, then the entire value will be 0xffff...  which is
probably not what the caller intended.

This patch casts the entire thing to unsigned before returning, which
generates almost the same assembly as the current code but replaces the
final "cltq" (sign extend) with a "mov %eax %eax" (zero-extend).  This
fixes booting certain guests under KVM.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-05 13:22:18 -08:00
Linus Torvalds
9a6fc8d0f8 Merge branch 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen: mask extended topology info in cpuid
  xen/hvc: make sure console output is always emitted, with explicit polling
2009-11-05 10:58:07 -08:00
Linus Torvalds
608221fdf9 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Fix kthread_bind() by moving the body of kthread_bind() to sched.c
  sched: Disable SD_PREFER_LOCAL at node level
  sched: Fix boot crash by zalloc()ing most of the cpu masks
  sched: Strengthen buddies and mitigate buddy induced latencies
2009-11-05 10:56:47 -08:00
Linus Torvalds
411094acb7 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, fs: Fix x86 procfs stack information for threads on 64-bit
  x86: Add reboot quirk for 3 series Mac mini
  x86: Fix printk message typo in mtrr cleanup code
  dma-debug: Fix compile warning with PAE enabled
  x86/amd-iommu: Un__init function required on shutdown
  x86/amd-iommu: Workaround for erratum 63
2009-11-05 10:54:08 -08:00
Gleb Natapov
abb3911965 KVM: get_tss_base_addr() should return a gpa_t
If TSS we are switching to resides in high memory task switch will fail
since address will be truncated. Windows2k3 does this sometimes when
running with more then 4G

Cc: stable@kernel.org
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-11-04 12:42:36 -02:00
Jan Kiszka
a9e38c3e01 KVM: x86: Catch potential overrun in MCE setup
We only allocate memory for 32 MCE banks (KVM_MAX_MCE_BANKS) but we
allow user space to fill up to 255 on setup (mcg_cap & 0xff), corrupting
kernel memory. Catch these overflows.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-11-04 12:42:35 -02:00
Stefani Seibold
89240ba059 x86, fs: Fix x86 procfs stack information for threads on 64-bit
This patch fixes two issues in the procfs stack information on
x86-64 linux.

The 32 bit loader compat_do_execve did not store stack
start. (this was figured out by Alexey Dobriyan).

The stack information on a x64_64 kernel always shows 0 kbyte
stack usage, because of a missing implementation of the KSTK_ESP
macro which always returned -1.

The new implementation now returns the right value.

Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1257240160.4889.24.camel@wall-e>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-04 13:25:03 +01:00
Rusty Russell
d7d3756c5b cpumask: Use modern cpumask style in Xen
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
LKML-Reference: <200911031458.38406.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-04 13:19:50 +01:00
Rusty Russell
ce7c42710e cpumask: Avoid cpumask_t in arch/x86/kernel/apic/nmi.c
Ingo wants the certainty of a static cpumask (rather than a
cpumask_var_t), but cpumask_t will some day be undefined to
avoid on-stack declarations.

This is what DECLARE_BITMAP/to_cpumask() is for.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <200911031453.52394.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-04 13:17:53 +01:00
Hiroshi Shimamoto
09879b99d4 x86: Gitignore: arch/x86/lib/inat-tables.c
Ignore generated file arch/x86/lib/inat-tables.c.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
LKML-Reference: <4AF0FBD7.7000501@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-04 13:11:28 +01:00
Ingo Molnar
a2e7127153 Merge commit 'v2.6.32-rc6' into perf/core
Conflicts:
	tools/perf/Makefile

Merge reason: Resolve the conflict, merge to upstream and merge in
              perf fixes so we can add a dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-04 11:59:45 +01:00
Brian Gerst
97829de5a3 x86, 64-bit: Fix bstep_iret jump
This jump should be unconditional.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1257274925-15713-1-git-send-email-brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-03 20:50:02 +01:00
Jeremy Fitzhardinge
82d6469916 xen: mask extended topology info in cpuid
A Xen guest never needs to know about extended topology, and knowing
would just confuse it.

This patch just zeros ebx in leaf 0xb which indicates no topology info,
preventing a crash under Xen on cpus which support this leaf.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-11-03 11:09:12 -08:00
Paul Mundt
41a48d14f6 x86/hw-breakpoints: Actually flush thread breakpoints in flush_thread().
flush_thread() tries to do a TIF_DEBUG check before calling in to
flush_thread_hw_breakpoint() (which subsequently clears the thread flag),
but for some reason, the x86 code is manually clearing TIF_DEBUG
immediately before the test, so this path will never be taken.

This kills off the erroneous clear_tsk_thread_flag() and lets
flush_thread_hw_breakpoint() actually get invoked.

Presumably folks were getting lucky with testing and the
free_thread_info() -> free_thread_xstate() path was taking care of the
flush there.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: "K.Prasad" <prasad@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alan Stern <stern@rowland.harvard.edu>
LKML-Reference: <20091005102306.GA7889@linux-sh.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-11-03 18:05:44 +01:00
Huang Ying
01dd958277 crypto: ghash-intel - Fix irq_fpu_usable usage
When renaming kernel_fpu_using to irq_fpu_usable, the semantics of the
function is changed too, from mesuring whether kernel is using FPU,
that is, the FPU is NOT available, to measuring whether FPU is usable,
that is, the FPU is available.

But the usage of irq_fpu_usable in ghash-clmulni-intel_glue.c is not
changed accordingly. This patch fixes this.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2009-11-03 10:55:20 -05:00
Ingo Molnar
1d87cff407 Merge branch 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2009-11-03 16:54:14 +01:00
Arjan van de Ven
a489ca355e x86: Make sure we also print a Code: line for show_regs()
show_regs() is called as a mini BUG() equivalent in some places,
specifically for the "scheduling while atomic" case.

Unfortunately right now it does not print a Code: line unlike
a real bug/oops.

This patch changes the x86 implementation of show_regs() so that
it calls the same function as oopses do to print the registers
as well as the Code: line.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <20091102165915.4a980fc0@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-03 16:50:22 +01:00
Herbert Xu
3b0d65969b crypto: ghash-intel - Add PSHUFB macros
Add PSHUFB macros instead of repeating byte sequences, suggested
by Ingo.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-11-03 09:11:15 -05:00
Joerg Roedel
342688f9db Merge branches 'amd-iommu/fixes' and 'dma-debug/fixes' into iommu/fixes 2009-11-03 12:05:40 +01:00
Mike Galbraith
6b9de613ae sched: Disable SD_PREFER_LOCAL at node level
Yanmin Zhang reported that SD_PREFER_LOCAL induces an order of
magnitude increase in select_task_rq_fair() overhead while
running heavy wakeup benchmarks (tbench and vmark).

Since SD_BALANCE_WAKE is off at node level, turn SD_PREFER_LOCAL
off as well pending further investigation.

Reported-by: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-03 07:24:07 +01:00
Linus Torvalds
efcd9e0b91 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Make EFI RTC function depend on 32bit again
  x86-64: Fix register leak in 32-bit syscall audting
  x86: crash_dump: Fix non-pae kdump kernel memory accesses
  x86: Side-step lguest problem by only building cmpxchg8b_emu for pre-Pentium
  x86: Remove STACKPROTECTOR_ALL
2009-11-02 09:45:17 -08:00
Suresh Siddha
e7d23dde9b x86_64, cpa: Use only text section in set_kernel_text_rw/ro
set_kernel_text_rw()/set_kernel_text_ro() are marking pages
starting from _text to __start_rodata as RW or RO.

With CONFIG_DEBUG_RODATA, there might be free pages (associated
with padding the sections to 2MB large page boundary) between
text and rodata sections that are given back to page allocator.
So we should use only use the start (__text) and end
(__stop___ex_table) of the text section in
set_kernel_text_rw()/set_kernel_text_ro().

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20091029024821.164525222@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-02 17:17:24 +01:00
Suresh Siddha
55ca3cc174 x86_64, ftrace: Make ftrace use kernel identity mapping to modify code
On x86_64, kernel text mappings are mapped read-only with
CONFIG_DEBUG_RODATA. So use the kernel identity mapping instead
of the kernel text mapping to modify the kernel text.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20091029024821.080941108@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-02 17:16:36 +01:00
Suresh Siddha
502f660466 x86, cpa: Fix kernel text RO checks in static_protection()
Steven Rostedt reported that we are unconditionally making the
kernel text mapping as read-only. i.e., if someone does cpa() to
the kernel text area for setting/clearing any page table
attribute, we unconditionally clear the read-write attribute for
the kernel text mapping that is set at compile time.

We should delay (to forbid the write attribute) and enforce only
after the kernel has mapped the text as read-only.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20091029024820.996634347@sbs-t61.sc.intel.com>
[ marked kernel_set_to_readonly as __read_mostly ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-02 17:16:35 +01:00
Suresh Siddha
5231a68614 x86: Remove local_irq_enable()/local_irq_disable() in fixup_irqs()
To ensure that we handle all the pending interrupts (destined
for this cpu that is going down) in the interrupt subsystem
before the cpu goes offline, fixup_irqs() does:

	local_irq_enable();
	mdelay(1);
	local_irq_disable();

Enabling interrupts is not a good thing as this cpu is already
offline. So this patch replaces that logic with,

	mdelay(1);
	check APIC_IRR bits
	Retrigger the irq at the new destination if any interrupt has arrived
	via IPI.

For IO-APIC level triggered interrupts, this retrigger IPI will
appear as an edge interrupt. ack_apic_level() will detect this
condition and IO-APIC RTE's remoteIRR is cleared using directed
EOI(using IO-APIC EOI register) on Intel platforms and for
others it uses the existing mask+edge logic followed by
unmask+level.

We can also remove mdelay() and then send spuriuous interrupts
to new cpu targets for all the irqs that were handled previously
by this cpu that is going offline. While it works, I have seen
spurious interrupt messages (nothing wrong but still annoying
messages during cpu offline, which can be seen during
suspend/resume etc)

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Gary Hade <garyhade@us.ibm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <20091026230002.043281924@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-02 15:56:37 +01:00
Suresh Siddha
b3ec0a37a7 x86: Use EOI register in io-apic on intel platforms
IO-APIC's in intel chipsets support EOI register starting from
IO-APIC version 2. Use that when ever we need to clear the
IO-APIC RTE's RemoteIRR bit explicitly.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Gary Hade <garyhade@us.ibm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <20091026230001.947855317@sbs-t61.sc.intel.com>
[ Marked use_eio_reg as __read_mostly, fixed small details ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-02 15:56:36 +01:00
Suresh Siddha
a5e74b8419 x86: Force irq complete move during cpu offline
When a cpu goes offline, fixup_irqs() try to move irq's
currently destined to the offline cpu to a new cpu. But this
attempt will fail if the irq is recently moved to this cpu and
the irq still hasn't arrived at this cpu (for non intr-remapping
platforms this is when we free the vector allocation at the
previous destination) that is about to go offline.

This will endup with the interrupt subsystem still pointing the
irq to the offline cpu, causing that irq to not work any more.

Fix this by forcing the irq to complete its move (its been a
long time we moved the irq to this cpu which we are offlining
now) and then move this irq to a new cpu before this cpu goes
offline.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Gary Hade <garyhade@us.ibm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <20091026230001.848830905@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-02 15:56:36 +01:00
Suresh Siddha
23359a88e7 x86: Remove move_cleanup_count from irq_cfg
move_cleanup_count for each irq in irq_cfg is keeping track of
the total number of cpus that need to free the corresponding
vectors associated with the irq which has now been migrated to
new destination. As long as this move_cleanup_count is non-zero
(i.e., as long as we have n't freed the vector allocations on
the old destinations) we were preventing the irq's further
migration.

This cleanup count is unnecessary and it is enough to not allow
the irq migration till we send the cleanup vector to the
previous irq destination, for which we already have irq_cfg's
move_in_progress.  All we need to make sure is that we free the
vector at the old desintation but we don't need to wait till
that gets freed.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Gary Hade <garyhade@us.ibm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <20091026230001.752968906@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-02 15:56:35 +01:00
Suresh Siddha
84e21493a3 x86, intr-remap: Avoid irq_chip mask/unmask in fixup_irqs() for intr-remapping
In the presence of interrupt-remapping, irqs will be migrated in
the process context and we don't do (and there is no need to)
irq_chip mask/unmask while migrating the interrupt.

Similarly fix the fixup_irqs() that get called during cpu
offline and avoid calling irq_chip mask/unmask for irqs that are
ok to be migrated in the process context.

While we didn't observe any race condition with the existing
code, this change takes complete advantage of
interrupt-remapping in the newer generation platforms and avoids
any potential HW lockup's (that often worry Eric :)

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: garyhade@us.ibm.com
LKML-Reference: <20091026230001.661423939@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-02 15:56:35 +01:00
Suresh Siddha
7a7732bc0f x86: Unify fixup_irqs() for 32-bit and 64-bit kernels
There is no reason to have different fixup_irqs() for 32-bit and
64-bit kernels. Unify by using the superior 64-bit version for
both the kernels.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <20091026230001.562512739@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-02 15:56:34 +01:00
Gottfried Haider
05154752cf x86: Add reboot quirk for 3 series Mac mini
Reboot does not work out of the box on my "Early 2009" Mac mini
(3,1). Detect this machine via DMI as we do for recent MacBooks.

Signed-off-by: Gottfried Haider <gottfried.haider@gmail.com>
Cc: Ozan Çağlayan <ozan@pardus.org.tr>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-02 15:46:17 +01:00
Dave Jones
16121d70fd x86: Fix printk message typo in mtrr cleanup code
Trivial typo.

Signed-off-by: Dave Jones <davej@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-02 08:36:18 +01:00
Herbert Xu
2d06ef7f42 crypto: ghash-intel - Hard-code pshufb
Old gases don't have a clue what pshufb stands for so we have
to hard-code it for now.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2009-11-01 12:49:44 -05:00
Linus Torvalds
2e2ec95235 Merge branch 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen: set up mmu_ops before trying to set any ptes
2009-10-29 15:03:36 -07:00
Linus Torvalds
6e958d73c2 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Do less agressive buddy clearing
  sched: Disable SD_PREFER_LOCAL for MC/CPU domains
2009-10-29 08:10:38 -07:00
Linus Torvalds
7811a32407 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, UV: Set DELIVERY_MODE=4 for vector=NMI_VECTOR in uv_hub_send_ipi()
  x86, UV: Fix and clean up bau code to use uv_gpa_to_pnode()
  x86: Don't print number of MCE banks for every CPU
  x86, UV: Fix information in __uv_hub_info structure
  x86: Document linker script ASSERT() quirk
2009-10-29 08:10:26 -07:00
Ingo Molnar
9de09ace8d Merge branch 'tracing/urgent' into tracing/core
Merge reason: Pick up fixes and move base from -rc1 to -rc5.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-29 09:02:20 +01:00
Masami Hiramatsu
3f7e454af1 x86: Add Intel FMA instructions to x86 opcode map
Add Intel FMA(FUSED-MULTIPLY-ADD) instructions to x86 opcode map
for x86 instruction decoder.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
LKML-Reference: <20091027204235.30545.33997.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-29 08:47:47 +01:00
Masami Hiramatsu
e0e492e99b x86: AVX instruction set decoder support
Add Intel AVX(Advanced Vector Extensions) instruction set
support to x86 instruction decoder. This adds insn.vex_prefix
field for storing VEX prefixes, and introduces some original
tags for expressing opcodes attributes.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
LKML-Reference: <20091027204226.30545.23451.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-29 08:47:46 +01:00
Masami Hiramatsu
82cb57028c x86: Add pclmulq to x86 opcode map
Add pclmulq opcode to x86 opcode map.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
LKML-Reference: <20091027204219.30545.82039.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-29 08:47:46 +01:00
Masami Hiramatsu
04d46c1b13 x86: Merge INAT_REXPFX into INAT_PFX_*
Merge INAT_REXPFX into INAT_PFX_* macro and rename it to
INAT_PFX_REX.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
LKML-Reference: <20091027204211.30545.58090.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-29 08:47:45 +01:00
Masami Hiramatsu
7f387d3f24 x86: Fix SSE opcode map bug
Fix superscripts position because some superscripts of SSE
opcode are not put in correct position.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
LKML-Reference: <20091027204204.30545.97296.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-29 08:47:45 +01:00
Joerg Roedel
ca0207114f x86/amd-iommu: Un__init function required on shutdown
The function iommu_feature_disable is required on system
shutdown to disable the IOMMU but it is marked as __init.
This may result in a panic if the memory is reused. This
patch fixes this bug.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-10-28 18:02:26 +01:00
Jeremy Fitzhardinge
973df35ed9 xen: set up mmu_ops before trying to set any ptes
xen_setup_stackprotector() ends up trying to set page protections,
so we need to have vm_mmu_ops set up before trying to do so.
Failing to do so causes an early boot crash.

[ Impact: Fix early crash under Xen. ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-10-27 16:54:19 -07:00
Steven Rostedt
883242dd0e tracing: allow to change permissions for text with dynamic ftrace enabled
The commit 74e081797b
x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATA
prevents text sections from becoming read/write using set_memory_rw.

The dynamic ftrace changes all text pages to read/write just before
converting the calls to tracing to nops, and vice versa.

I orginally just added a flag to allow this transaction when ftrace
did the change, but I also found that when the CPA testing was running
it would remove the read/write as well, and ftrace does not do the text
conversion on boot up, and the CPA changes caused the dynamic tracer
to fail on self tests.

The current solution I have is to simply not to prevent
change_page_attr from setting the RW bit for kernel text pages.

Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-10-27 13:15:11 -04:00
Andreas Herrmann
6f9b41006a x86, apic: Clear APIC Timer Initial Count Register on shutdown
Commit a98f8fd24f (x86: apic reset
counter on shutdown) set the counter to max to avoid spurious
interrupts when the timer is re-enabled.

(In theory) you'll still get a spurious interrupt if spending
more than 344 seconds with this interrupt disabled and then
unmasking it.

The right thing to do is to clear the register. This disables
the interrupt from happening (at least it does on AMD hardware).

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20091027100138.GB30802@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-27 14:54:21 +01:00
Feng Tang
772be899bc x86: Make EFI RTC function depend on 32bit again
The EFI RTC functions are only available on 32 bit. commit 7bd867df
(x86: Move get/set_wallclock to x86_platform_ops) removed the 32bit
dependency which leads to boot crashes on 64bit EFI systems.

Add the dependency back. 
Solves: http://bugzilla.kernel.org/show_bug.cgi?id=14466

Tested-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Feng Tang <feng.tang@intel.com>
LKML-Reference: <20091020125402.028d66d5@feng-desktop>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-10-27 12:35:48 +01:00
Jan Beulich
81766741fe x86-64: Fix register leak in 32-bit syscall audting
Restoring %ebp after the call to audit_syscall_exit() is not
only unnecessary (because the register didn't get clobbered),
but in the sysenter case wasn't even doing the right thing: It
loaded %ebp from a location below the top of stack (RBP <
ARGOFFSET), i.e. arbitrary kernel data got passed back to user
mode in the register.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: <stable@kernel.org>
LKML-Reference: <4AE5CC4D020000780001BD13@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-26 16:23:26 +01:00
Jiri Slaby
72ed7de74e x86: crash_dump: Fix non-pae kdump kernel memory accesses
Non-PAE 32-bit dump kernels may wrap an address around 4G and
poke unwanted space. ptes there are 32-bit long, and since
pfn << PAGE_SIZE may exceed this limit, high pfn bits are
cropped and wrong address mapped by kmap_atomic_pfn in
copy_oldmem_page.

Don't allow this behavior in non-PAE kdump kernels by checking
pfns passed into copy_oldmem_page. In the case of failure,
userspace process gets EFAULT.

[v2]
- fix comments
- move ifdefs inside the function

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Paul Mundt <lethal@linux-sh.org>
LKML-Reference: <1256551903-30567-1-git-send-email-jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-26 12:38:59 +01:00
Rusty Russell
ae1b22f6e4 x86: Side-step lguest problem by only building cmpxchg8b_emu for pre-Pentium
Commit 79e1dd05d1 "x86: Provide an alternative() based
cmpxchg64()" broke lguest, even on systems which have cmpxchg8b
support.  The emulation code gets used until alternatives get
run, but it contains native instructions, not their paravirt
alternatives.

The simplest fix is to turn this code off except for 386 and 486
builds.

Reported-by: Johannes Stezenbach <js@sig21.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: lguest@ozlabs.org
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <200910261426.05769.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-26 12:33:02 +01:00
Alexander Potashev
4868402d95 x86, boot: Simplify setting of the PAE bit
A single 'movl' is shorter than the 'xorl'-'orl' pair.
No change in behaviour.

Signed-off-by: Alexander Potashev <aspotashev@gmail.com>
LKML-Reference: <1256341043-4928-1-git-send-email-aspotashev@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-24 11:06:38 +02:00
Arjan van de Ven
14a3f40aaf x86: Remove STACKPROTECTOR_ALL
STACKPROTECTOR_ALL has a really high overhead (runtime and stack
footprint) and is not really worth it protection wise (the
normal STACKPROTECTOR is in effect for all functions with
buffers already), so lets just remove the option entirely.

Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Eric Sandeen <sandeen@redhat.com>
LKML-Reference: <20091023073101.3dce4ebb@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-23 16:35:23 +02:00
Minchan Kim
b1258ac296 x86: Remove pfn in add_one_highpage_init()
commit cc9f7a0ccf changed
add_one_highpage_init. We don't use pfn any more.
Let's remove unnecessary argument.

This patch doesn't chage function behavior.
This patch is based on v2.6.32-rc5.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
LKML-Reference: <20091022112722.adc8e55c.minchan.kim@barrios-desktop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-23 13:39:26 +02:00
Ingo Molnar
4331595650 Merge branch 'perf/core' into perf/probes
Conflicts:
	tools/perf/Makefile

Merge reason:

 - fix the conflict
 - pick up the pr_*() infrastructure to queue up dependent patch

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-23 08:23:20 +02:00
Linus Torvalds
422b42fa79 Merge branch 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: Prevent kvm_init from corrupting debugfs structures
  KVM: MMU: fix pointer cast
  KVM: use proper hrtimer function to retrieve expiration time
2009-10-22 08:26:15 +09:00
Linus Torvalds
4fe71dba2f Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: aesni-intel - Fix irq_fpu_usable usage
  crypto: padlock-sha - Fix stack alignment
2009-10-22 08:16:01 +09:00
Ingo Molnar
9bf4e7fba8 x86, instruction decoder: Fix test_get_len build rules
Add the kernel source include file as well to the include files
search path, to fix this build bug:

 In file included from arch/x86/tools/test_get_len.c:28:
   arch/x86/lib/insn.c:21:26: error: linux/string.h: No such file or directory

Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap<systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091020165531.4145.21872.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-21 14:42:56 +02:00
Robin Holt
02dd0a0613 x86, UV: Set DELIVERY_MODE=4 for vector=NMI_VECTOR in uv_hub_send_ipi()
When sending a NMI_VECTOR IPI using the UV_HUB_IPI_INT register,
we need to ensure the delivery mode field of that register has
NMI delivery selected.

This makes those IPIs true NMIs, instead of flat IPIs. It
matters to reboot sequences and KGDB, both of which use NMI
IPIs.

Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Jack Steiner <steiner@sgi.com>
Cc: Martin Hicks <mort@sgi.com>
Cc: <stable@kernel.org>
LKML-Reference: <20091020193620.877322000@alcatraz.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-21 13:31:13 +02:00
Masami Hiramatsu
9983d60d74 x86: Add AES opcodes to opcode map
Add Intel AES opcodes to x86 opcode map. These opcodes are
used in arch/x86/crypt/aesni-intel_asm.S.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap<systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091020165531.4145.21872.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-21 13:25:29 +02:00
Masami Hiramatsu
06ed6ba5ec x86: Fix group attribute decoding bug
Fix a typo in inat_get_group_attribute() which should refer
inat_group_tables, not inat_escape_tables.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap<systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091020165524.4145.97333.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-21 13:25:28 +02:00
Huang Ying
13b79b9715 crypto: aesni-intel - Fix irq_fpu_usable usage
When renaming kernel_fpu_using to irq_fpu_usable, the semantics of the
function is changed too, from mesuring whether kernel is using FPU,
that is, the FPU is NOT available, to measuring whether FPU is usable,
that is, the FPU is available.

But the usage of irq_fpu_usable in aesni-intel_glue.c is not changed
accordingly. This patch fixes this.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2009-10-20 16:20:47 +09:00
Suresh Siddha
d6cc1c3af7 x86-64: add comment for RODATA large page retainment
Add a comment explaining why RODATA is aligned to 2 MB.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-20 14:46:01 +09:00
Suresh Siddha
74e081797b x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATA
CONFIG_DEBUG_RODATA chops the large pages spanning boundaries of kernel
text/rodata/data to small 4KB pages as they are mapped with different
attributes (text as RO, RODATA as RO and NX etc).

On x86_64, preserve the large page mappings for kernel text/rodata/data
boundaries when CONFIG_DEBUG_RODATA is enabled. This is done by allowing the
RODATA section to be hugepage aligned and having same RWX attributes
for the 2MB page boundaries

Extra Memory pages padding the sections will be freed during the end of the boot
and the kernel identity mappings will have different RWX permissions compared to
the kernel text mappings.

Kernel identity mappings to these physical pages will be mapped with smaller
pages but large page mappings are still retained for kernel text,rodata,data
mappings.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091014220254.190119924@sbs-t61.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-20 14:46:00 +09:00
Suresh Siddha
b9af7c0d44 x86-64: preserve large page mapping for 1st 2MB kernel txt with CONFIG_DEBUG_RODATA
In the first 2MB, kernel text is co-located with kernel static
page tables setup by head_64.S.  CONFIG_DEBUG_RODATA chops this
2MB large page mapping to small 4KB pages as we mark the kernel text as RO,
leaving the static page tables as RW.

With CONFIG_DEBUG_RODATA disabled, OLTP run on NHM-EP shows 1% improvement
with 2% reduction in system time and 1% improvement in iowait idle time.

To recover this, move the kernel static page tables to .data section, so that
we don't have to break the first 2MB of kernel text to small pages with
CONFIG_DEBUG_RODATA.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091014220254.063193621@sbs-t61.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-20 14:46:00 +09:00
Huang Ying
0e1227d356 crypto: ghash - Add PCLMULQDQ accelerated implementation
PCLMULQDQ is used to accelerate the most time-consuming part of GHASH,
carry-less multiplication. More information about PCLMULQDQ can be
found at:

http://software.intel.com/en-us/articles/carry-less-multiplication-and-its-usage-for-computing-the-gcm-mode/

Because PCLMULQDQ changes XMM state, its usage must be enclosed with
kernel_fpu_begin/end, which can be used only in process context, the
acceleration is implemented as crypto_ahash. That is, request in soft
IRQ context will be defered to the cryptd kernel thread.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2009-10-19 11:53:06 +09:00
Frederic Weisbecker
0f8f86c7bd Merge commit 'perf/core' into perf/hw-breakpoint
Conflicts:
	kernel/Makefile
	kernel/trace/Makefile
	kernel/trace/trace.h
	samples/Makefile

Merge reason: We need to be uptodate with the perf events development
branch because we plan to rewrite the breakpoints API on top of
perf events.
2009-10-18 01:12:33 +02:00
Ingo Molnar
bb3c3e8071 Merge commit 'v2.6.32-rc5' into perf/probes
Conflicts:
	kernel/trace/trace_event_profile.c

Merge reason: update to -rc5 and resolve conflict.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-17 09:58:25 +02:00
Masami Hiramatsu
d1baf5a5a6 x86: Add AMD prefetch and 3DNow! opcodes to opcode map
Add AMD prefetch and 3DNow! opcode including FEMMS. Since 3DNow!
uses the last immediate byte as an opcode extension byte, x86
insn just treats the extenstion byte as an immediate byte
instead of a part of opcode (insn_get_opcode() decodes first
"0x0f 0x0f" bytes.)

Users who are interested in analyzing 3DNow! opcode still can
decode it by analyzing the immediate byte.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20091017000744.16556.27881.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-17 09:53:59 +02:00
Masami Hiramatsu
8c95bc3e20 x86: Add MMX/SSE opcode groups to opcode map
Add missing MMX/SSE opcode groups to x86 opcode map.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20091017000736.16556.29061.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-17 09:53:58 +02:00
Frederik Deweerdt
8a8365c560 KVM: MMU: fix pointer cast
On a 32 bits compile, commit 3da0dd433d
introduced the following warnings:

arch/x86/kvm/mmu.c: In function ‘kvm_set_pte_rmapp’:
arch/x86/kvm/mmu.c:770: warning: cast to pointer from integer of different size
arch/x86/kvm/mmu.c: In function ‘kvm_set_spte_hva’:
arch/x86/kvm/mmu.c:849: warning: cast from pointer to integer of different size

The following patch uses 'unsigned long' instead of u64 to match the
pointer size on both arches.

Signed-off-by: Frederik Deweerdt <frederik.deweerdt@xprog.eu>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-10-16 12:30:26 -03:00
Marcelo Tosatti
ace1546487 KVM: use proper hrtimer function to retrieve expiration time
hrtimer->base can be temporarily NULL due to racing hrtimer_start.
See switch_hrtimer_base/lock_hrtimer_base.

Use hrtimer_get_remaining which is robust against it.

CC: stable@kernel.org
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-10-16 12:30:25 -03:00
Robin Holt
1d21e6e3ff x86, UV: Fix and clean up bau code to use uv_gpa_to_pnode()
Create an inline function to extract the pnode from a global
physical address and then convert the broadcast assist unit to
use the newly created uv_gpa_to_pnode function.

The open-coded code was wrong as well - it might explain a
few of our unexplained bau hangs.

Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Cliff Whickman <cpw@sgi.com>
Cc: linux-mm@kvack.org
Cc: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20091016112920.GZ8903@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-16 14:51:53 +02:00
Borislav Petkov
b33a636364 x86, mce: Add a global MCE init helper
Add an early initcall (pre SMP) which sets up global MCE
functionality.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <1255689093-26921-2-git-send-email-borislav.petkov@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-16 14:46:50 +02:00
Borislav Petkov
5e09954a9a x86, mce: Fix up MCE naming nomenclature
Prefix global/setup routines with "mcheck_" thus differentiating
from the internal facilities prefixed with "mce_". Also, prefix
the per cpu calls with mcheck_cpu and rename them to reflect the
MCE setup hierarchy of calls better.

There should be no functionality change resulting from this
patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <1255689093-26921-1-git-send-email-borislav.petkov@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-16 14:46:49 +02:00
Ingo Molnar
6b50f5c7c7 Merge branches 'x86/mce' and 'x86/urgent' into perf/mce
Merge reason: Put all MCE changes into this branch, we are
              queueing up a dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-16 14:42:25 +02:00
Roland Dreier
93ae5012a7 x86: Don't print number of MCE banks for every CPU
The MCE initialization code explicitly says it doesn't handle
asymmetric configurations where different CPUs support different
numbers of MCE banks, and it prints a big warning in that case.

Therefore, printing the "mce: CPU supports <x> MCE banks"
message into the kernel log for every CPU is pure redundancy
that clutters the log significantly for systems with lots of
CPUs.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
LKML-Reference: <adaeip473qt.fsf@cisco.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-16 09:20:03 +02:00
Robin Holt
036ed8ba61 x86, UV: Fix information in __uv_hub_info structure
A few parts of the uv_hub_info structure are initialized
incorrectly.

 - n_val is being loaded with m_val.
 - gpa_mask is initialized with a bytes instead of an unsigned long.
 - Handle the case where none of the alias registers are used.

Lastly I converted the bau over to using the uv_hub_info->m_val
which is the correct value.

Without this patch, booting a large configuration hits a
problem where the upper bits of the gnode affect the pnode
and the bau will not operate.

Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Jack Steiner <steiner@sgi.com>
Cc: Cliff Whickman <cpw@sgi.com>
Cc: stable@kernel.org
LKML-Reference: <20091015224946.396355000@alcatraz.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-16 08:18:34 +02:00
Ingo Molnar
a5912f6b3e x86: Document linker script ASSERT() quirk
Older binutils breaks if ASSERT() is used without a sink
for the output.

For example 2.14.90.0.6 is known to be broken, the link
fails with:

  LD      .tmp_vmlinux1
  ld:arch/x86/kernel/vmlinux.lds:678: parse error

Document this quirk in all three files that use it.

  See:    http://marc.info/?l=linux-kbuild&m=124930110427870&w=2
  See[2]: d2ba8b2 ("x86: Fix assert syntax in vmlinux.lds.S")

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <4AD6523D.5030909@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-16 07:18:46 +02:00
Cyrill Gorcunov
f88f2b4fdb x86: apic: Allow noop operations to be called almost at any time
As only apic noop is used we allow to use almost any operation
caller wants (and which of them noop driver supports of
course).

Initially it was reported by Ingo Molnar that apic noop
issue a warning for pkg id (which is actually false positive
and should be eliminated).

So we save checking (and warning issue) for read/write
operations while allow any other ops to be freely used.

Also:
 - fix noop_cpu_to_logical_apicid, it should be 0.
 - rename noop_default_phys_pkg_id to noop_phys_pkg_id
   (we use default_ prefix for more general routines
    in apic subsystem).

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
LKML-Reference: <20091015150416.GC5331@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-15 17:26:53 +02:00
Ingo Molnar
713490e02e Merge branch 'tracing/core' into perf/core
Merge reason: to add event filter support we need the following
commits from the tracing tree:

 3f6fe06: tracing/filters: Unify the regex parsing helpers
 1889d20: tracing/filters: Provide basic regex support
 737f453: tracing/filters: Cleanup useless headers

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-15 11:34:00 +02:00
Ingo Molnar
b226f744d4 Merge branch 'linus' into perf/core
Merge reason: pick up tools/perf/ changes from upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-15 08:44:44 +02:00
Ingo Molnar
db8590f504 Revert "x86: linker script syntax nits"
This reverts commit e9a63a4e55.

This breaks older binutils, where sink-less asserts are broken.

See this commit for further details:

  d2ba8b2: x86: Fix assert syntax in vmlinux.lds.S

Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4AD6523D.5030909@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-15 08:09:55 +02:00
Ingo Molnar
a0738a688d Merge branch 'linus' into x86/urgent
Merge reason: pull in latest, to be able to revert a patch there.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-15 08:07:30 +02:00
Linus Torvalds
601adfedba Merge branch 'topic/x86-lds-nits' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland
* 'topic/x86-lds-nits' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland:
  x86: linker script syntax nits
2009-10-14 15:33:05 -07:00
Linus Torvalds
f061d83a2b Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Fix missing kernel-doc notation
  Revert "x86, timers: Check for pending timers after (device) interrupts"
  sched: Update the clock of runqueue select_task_rq() selected
2009-10-14 15:25:04 -07:00
Linus Torvalds
ea87644105 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/paravirt: Use normal calling sequences for irq enable/disable
  x86: fix kernel panic on 32 bits when profiling
  x86: Fix Suspend to RAM freeze on Acer Aspire 1511Lmi laptop
  x86, vmi: Mark VMI deprecated and schedule it for removal
2009-10-14 15:24:32 -07:00
Roland McGrath
e9a63a4e55 x86: linker script syntax nits
The linker scripts grew some use of weirdly wrong linker script syntax.
It happens to work, but it's not what the syntax is documented to be.
Clean it up to use the official syntax.

Signed-off-by: Roland McGrath <roland@redhat.com>
CC: Ian Lance Taylor <iant@google.com>
2009-10-14 14:16:38 -07:00
Dimitri Sivanich
4a4de9c7d7 x86: UV RTC: Rename generic_interrupt to x86_platform_ipi
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
LKML-Reference: <20091014142257.GE11048@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14 18:27:11 +02:00
Dimitri Sivanich
d5991ff297 x86: UV RTC: Clean up error handling
Cleanup error handling in uv_rtc_setup_clock.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
LKML-Reference: <20091014142103.GD11048@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14 18:27:10 +02:00
Dimitri Sivanich
8c28de4d01 x86: UV RTC: Add clocksource only boot option
Add clocksource only boot option for UV RTC.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
LKML-Reference: <20091014141848.GC11048@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14 18:27:10 +02:00
Dimitri Sivanich
e47938b1fa x86: UV RTC: Fix early expiry handling
Tune/fix early timer expiry handling and return correct early timeout value
for set_next_event.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
LKML-Reference: <20091014141630.GB11048@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14 18:27:09 +02:00
Thomas Gleixner
05d86412ea x86: Remove BKL from apm_32
The lock/unlock kernel pair in do_open() got there with the BKL push
down and protects nothing. Remove it.

Replace the lock/unlock kernel in the ioctl code with a mutex to
protect standbys_pending and suspends_pending.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <20091010153349.365236337@linutronix.de>
2009-10-14 17:04:48 +02:00
Thomas Gleixner
ac06ea2cd0 x86: Remove BKL from microcode
cycle_lock_kernel() in microcode_open() is a worthless exercise as
there is nothing to wait for. Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <20091010153349.196074920@linutronix.de>
2009-10-14 17:04:48 +02:00
Ingo Molnar
7ec13187ef x86, apic: Fix prototype in hw_irq.h
This warning:

 In file included from arch/x86/include/asm/ipi.h:23,
                  from arch/x86/kernel/apic/apic_noop.c:27:
 arch/x86/include/asm/hw_irq.h:105: warning: ‘struct irq_desc’ declared inside parameter list
 arch/x86/include/asm/hw_irq.h:105: warning: its scope is only this definition or declaration, which is probably not what you want

triggers because irq_desc is defined after hw_irq.h is included
in irq.h. Since it's pointer reference only, a forward declaration
of the type will solve the problem.

LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14 15:06:42 +02:00
Peter Zijlstra
799e2205ec sched: Disable SD_PREFER_LOCAL for MC/CPU domains
Yanmin reported that both tbench and hackbench were significantly
hurt by trying to keep tasks local on these domains, esp on small
cache machines.

So disable it in order to promote spreading outside of the cache
domains.

Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1255083400.8802.15.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14 15:02:34 +02:00
Li Hong
89ccf465ab x86, perf_event: Rename 'performance counter interrupt'
In 'cdd6c482c9ff9c55475ee7392ec8f672eddb7be6', we renamed
Performance Counters -> Performance Events.

The name showed up in /proc/interrupts also needs a change. I use
PMI (Performance monitoring interrupt) here, since it is the
official name used in Intel's documents.

Signed-off-by: Li Hong <lihong.hi@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20091014105039.GA22670@uhli>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14 14:37:24 +02:00
Frederic Weisbecker
c44fc77084 tracing: Move syscalls metadata handling from arch to core
Most of the syscalls metadata processing is done from arch.
But these operations are mostly generic accross archs. Especially now
that we have a common variable name that expresses the number of
syscalls supported by an arch: NR_syscalls, the only remaining bits
that need to reside in arch is the syscall nr to addr translation.

v2: Compare syscalls symbols only after the "sys" prefix so that we
    avoid spurious mismatches with archs that have syscalls wrappers,
    in which case syscalls symbols have "SyS" prefixed aliases.
    (Reported by: Heiko Carstens)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
2009-10-14 09:53:56 +02:00
Dimitri Sivanich
9338ad6ffb x86, apic: Move SGI UV functionality out of generic IO-APIC code
Move UV specific functionality out of the generic IO-APIC code.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
LKML-Reference: <20091013203236.GD20543@sgi.com>
[ Cleaned up the code some more in their new places. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14 09:17:09 +02:00
Dimitri Sivanich
6c2c502910 x86: SGI UV: Fix irq affinity for hub based interrupts
This patch fixes handling of uv hub irq affinity.  IRQs with ALL or
NODE affinity can be routed to cpus other than their originally
assigned cpu.  Those with CPU affinity cannot be rerouted.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
LKML-Reference: <20090930160259.GA7822@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14 09:17:01 +02:00
Cyrill Gorcunov
2626eb2b2f x86, apic: Limit apic dumping, introduce new show_lapic= setup option
In case if a system has a large number of cpus printing apics
contents may consume a long time period.

We limit such an output by 1 apic by default. But to have an
ability to see all apics or some part of them we introduce
"show_lapic" setup option which allow us to limit/unlimit the
number of APICs being dumped.

Example: apic=debug show_lapic=5, or apic=debug show_lapic=all

Also move apic_verbosity checking upper that way so helper routines
do not need to inspect it at all.

Suggested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: yinghai@kernel.org
Cc: macro@linux-mips.org
LKML-Reference: <20091013201022.926793122@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14 09:17:01 +02:00
Cyrill Gorcunov
a933c61829 x86, apic: Use apic noop driver
In case if apic were disabled we may use the whole apic NOOP driver
instead of sparse poking the some functions in apic driver.

Also NOOP would catch any inappropriate apic operation calls (not
just read/write).

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: yinghai@kernel.org
Cc: macro@linux-mips.org
LKML-Reference: <20091013201022.747817361@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14 09:17:00 +02:00
Cyrill Gorcunov
9844ab11c7 x86, apic: Introduce the NOOP apic driver
Introduce NOOP APIC driver. We should use it in case if apic was
disabled due to hardware of software/firmware problems (including
user requested to disable it case).

The driver is attempting to catch any inappropriate apic operation
call with warning issue.

Also it is possible to use some apic operation like IPI calls,
read/write without checking for apic presence which should make
callers code easier.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: yinghai@kernel.org
Cc: macro@linux-mips.org
LKML-Reference: <20091013201022.534682104@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14 09:17:00 +02:00
Steven Rostedt
194ec34184 function-graph/x86: Replace unbalanced ret with jmp
The function graph tracer replaces the return address with a hook
to trace the exit of the function call. This hook will finish by
returning to the real location the function should return to.

But the current implementation uses a ret to jump to the real
return location. This causes a imbalance between calls and ret.
That is the original function does a call, the ret goes to the
handler and then the handler does a ret without a matching call.

Although the function graph tracer itself still breaks the branch
predictor by replacing the original ret, by using a second ret and
causing an imbalance, it breaks the predictor even more.

This patch replaces the ret with a jmp to keep the calls and ret
balanced. I tested this on one box and it showed a 1.7% increase in
performance. Another box only showed a small 0.3% increase. But no
box that I tested this on showed a decrease in performance by
making this change.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091013203425.042034383@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14 08:13:53 +02:00
Linus Torvalds
80fa680d22 Merge git://git.infradead.org/~dwmw2/iommu-2.6.32
* git://git.infradead.org/~dwmw2/iommu-2.6.32:
  x86: Move pci_iommu_init to rootfs_initcall()
  Run pci_apply_final_quirks() sooner.
  Mark pci_apply_final_quirks() __init rather than __devinit
  Rename pci_init() to pci_apply_final_quirks(), move it to quirks.c
  intel-iommu: Yet another BIOS workaround: Isoch DMAR unit with no TLB space
  intel-iommu: Decode (and ignore) RHSA entries
  intel-iommu: Make "Unknown DMAR structure" message more informative
2009-10-13 10:04:40 -07:00
Hidetoshi Seto
8968f9d3dc perf_event, x86, mce: Use TRACE_EVENT() for MCE logging
This approach is the first baby step towards solving many of the
structural problems the x86 MCE logging code is having today:

 - It has a private ring-buffer implementation that has a number
   of limitations and has been historically fragile and buggy.

 - It is using a quirky /dev/mcelog ioctl driven ABI that is MCE
   specific. /dev/mcelog is not part of any larger logging
   framework and hence has remained on the fringes for many years.

 - The MCE logging code is still very unclean partly due to its ABI
   limitations. Fields are being reused for multiple purposes, and
   the whole message structure is limited and x86 specific to begin
   with.

All in one, the x86 tree would like to move away from this private
implementation of an event logging facility to a broader framework.

By using perf events we gain the following advantages:

 - Multiple user-space agents can access MCE events. We can have an
   mcelog daemon running but also a system-wide tracer capturing
   important events in flight-recorder mode.

 - Sampling support: the kernel and the user-space call-chain of MCE
   events can be stored and analyzed as well. This way actual patterns
   of bad behavior can be matched to precisely what kind of activity
   happened in the kernel (and/or in the app) around that moment in
   time.

 - Coupling with other hardware and software events: the PMU can track a
   number of other anomalies - monitoring software might chose to
   monitor those plus the MCE events as well - in one coherent stream of
   events.

 - Discovery of MCE sources - tracepoints are enumerated and tools can
   act upon the existence (or non-existence) of various channels of MCE
   information.

 - Filtering support: we just subscribe to and act upon the events we
   are interested in. Then even on a per event source basis there's
   in-kernel filter expressions available that can restrict the amount
   of data that hits the event channel.

 - Arbitrary deep per cpu buffering of events - we can buffer 32
   entries or we can buffer as much as we want, as long as we have
   the RAM.

 - An NMI-safe ring-buffer implementation - mappable to user-space.

 - Built-in support for timestamping of events, PID markers, CPU
   markers, etc.

 - A rich ABI accessible over system call interface. Per cpu, per task
   and per workload monitoring of MCE events can be done this way. The
   ABI itself has a nice, meaningful structure.

 - Extensible ABI: new fields can be added without breaking tooling.
   New tracepoints can be added as the hardware side evolves. There's
   various parsers that can be used.

 - Lots of scheduling/buffering/batching modes of operandi for MCE
   events. poll() support. mmap() support. read() support. You name it.

 - Rich tooling support: even without any MCE specific extensions added
   the 'perf' tool today offers various views of MCE data: perf report,
   perf stat, perf trace can all be used to view logged MCE events and
   perhaps correlate them to certain user-space usage patterns. But it
   can be used directly as well, for user-space agents and policy action
   in mcelog, etc.

With this we hope to achieve significant code cleanup and feature
improvements in the MCE code, and we hope to be able to drop the
/dev/mcelog facility in the end.

This patch is just a plain dumb dump of mce_log() records to
the tracepoints / perf events framework - a first proof of
concept step.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <4AD42A0D.7050104@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-13 09:43:38 +02:00
Ingo Molnar
9dbdd6c41c Merge commit 'v2.6.32-rc4' into perf/core
Merge reason: we were on an -rc1 base, merge up to -rc4.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-13 09:31:34 +02:00
Ingo Molnar
2c96c142e9 Merge branch 'tracing/urgent' into tracing/core
Merge reason: Pick up tracing/filters fix from the urgent queue,
              we will queue up dependent patches.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-13 09:24:59 +02:00
Jeremy Fitzhardinge
71999d9862 x86/paravirt: Use normal calling sequences for irq enable/disable
Bastian Blank reported a boot crash with stackprotector enabled,
and debugged it back to edx register corruption.

For historical reasons irq enable/disable/save/restore had special
calling sequences to make them more efficient.  With the more
recent introduction of higher-level and more general optimisations
this is no longer necessary so we can just use the normal PVOP_
macros.

This fixes some residual bugs in the old implementations which left
edx liable to inadvertent clobbering. Also, fix some bugs in
__PVOP_VCALLEESAVE which were revealed by actual use.

Reported-by: Bastian Blank <bastian@waldi.eu.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: Xen-devel <xen-devel@lists.xensource.com>
LKML-Reference: <4AD3BC9B.7040501@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-13 09:22:01 +02:00
Arnaldo Carvalho de Melo
a2e2725541 net: Introduce recvmmsg socket syscall
Meaning receive multiple messages, reducing the number of syscalls and
net stack entry/exit operations.

Next patches will introduce mechanisms where protocols that want to
optimize this operation will provide an unlocked_recvmsg operation.

This takes into account comments made by:

. Paul Moore: sock_recvmsg is called only for the first datagram,
  sock_recvmsg_nosec is used for the rest.

. Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
  works in the same fashion as the ppoll one.

  If the underlying protocol returns a datagram with MSG_OOB set, this
  will make recvmmsg return right away with as many datagrams (+ the OOB
  one) it has received so far.

. Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
  datagrams and then recvmsg returns an error, recvmmsg will return
  the successfully received datagrams, store the error and return it
  in the next call.

This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
where we will be able to acquire the lock only at batch start and end, not at
every underlying recvmsg call.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-12 23:40:10 -07:00
Ingo Molnar
7a693d3f0d perf_events, x86: Fix event constraints code
There was namespace overlap due to a rename i did - this caused
the following build warning, reported by Stephen Rothwell against
linux-next x86_64 allmodconfig:

  arch/x86/kernel/cpu/perf_event.c: In function 'intel_get_event_idx':
  arch/x86/kernel/cpu/perf_event.c:1445: warning: 'event_constraint' is used uninitialized in this function

This is a real bug not just a warning: fix it by renaming the
global event-constraints table pointer to 'event_constraints'.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091013144223.369d616d.sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-13 08:19:53 +02:00
H. Peter Anvin
98272ed0d2 x86: use kernel_stack_pointer() in kprobes.c
The way to obtain a kernel-mode stack pointer from a struct pt_regs in
32-bit mode is "subtle": the stack doesn't actually contain the stack
pointer, but rather the location where it would have been marks the
actual previous stack frame.  For clarity, use kernel_stack_pointer()
instead of coding this weirdness explicitly.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
2009-10-12 14:19:35 -07:00
H. Peter Anvin
5ca6c0ca5d x86: use kernel_stack_pointer() in kgdb.c
The way to obtain a kernel-mode stack pointer from a struct
pt_regs in 32-bit mode is "subtle": the stack doesn't actually
contain the stack pointer, but rather the location where it would
have been marks the actual previous stack frame.  For clarity, use
kernel_stack_pointer() instead of coding this weirdness
explicitly.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
2009-10-12 14:19:35 -07:00
H. Peter Anvin
a343c75d33 x86: use kernel_stack_pointer() in dumpstack.c
The way to obtain a kernel-mode stack pointer from a struct pt_regs in
32-bit mode is "subtle": the stack doesn't actually contain the stack
pointer, but rather the location where it would have been marks the
actual previous stack frame.  For clarity, use kernel_stack_pointer()
instead of coding this weirdness explicitly.

Furthermore, user_mode() is only valid when the process is known to
not run in V86 mode.  Use the safer user_mode_vm() instead.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-12 14:19:34 -07:00
H. Peter Anvin
def3c5d0a3 x86: use kernel_stack_pointer() in process_32.c
The way to obtain a kernel-mode stack pointer from a struct pt_regs in
32-bit mode is "subtle": the stack doesn't actually contain the stack
pointer, but rather the location where it would have been marks the
actual previous stack frame.  For clarity, use kernel_stack_pointer()
instead of coding this weirdness explicitly.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-12 14:19:34 -07:00
David Rientjes
adc1938994 x86: Interleave emulated nodes over physical nodes
Add interleaved NUMA emulation support

This patch interleaves emulated nodes over the system's physical
nodes. This is required for interleave optimizations since
mempolicies, for example, operate by iterating over a nodemask and
act without knowledge of node distances.  It can also be used for
testing memory latencies and NUMA bugs in the kernel.

There're a couple of ways to do this:

 - divide the number of emulated nodes by the number of physical
   nodes and allocate the result on each physical node, or

 - allocate each successive emulated node on a different physical
   node until all memory is exhausted.

The disadvantage of the first option is, depending on the asymmetry
in node capacities of each physical node, emulated nodes may
substantially differ in size on a particular physical node compared
to another.

The disadvantage of the second option is, also depending on the
asymmetry in node capacities of each physical node, there may be
more emulated nodes allocated on a single physical node as another.

This patch implements the second option; we sacrifice the
possibility that we may have slightly more emulated nodes on a
particular physical node compared to another in lieu of node size
asymmetry.

 [ Note that "node capacity" of a physical node is not only a
   function of its addressable range, but also is affected by
   subtracting out the amount of reserved memory over that range.
   NUMA emulation only deals with available, non-reserved memory
   quantities. ]

We ensure there is at least a minimal amount of available memory
allocated to each node.  We also make sure that at least this
amount of available memory is available in ZONE_DMA32 for any node
that includes both ZONE_DMA32 and ZONE_NORMAL.

This patch also cleans the emulation code up by no longer passing
the statically allocated struct bootnode array among the various
functions. This init.data array is not allocated on the stack since
it may be very large and thus it may be accessed at file scope.

The WARN_ON() for nodes_cover_memory() when faking proximity
domains is removed since it relies on successive nodes always
having greater start addresses than previous nodes; with
interleaving this is no longer always true.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251519150.14754@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 22:56:46 +02:00
David Rientjes
8716273cae x86: Export srat physical topology
This is the counterpart to "x86: export k8 physical topology" for
SRAT. It is not as invasive because the acpi code already seperates
node setup into detection and registration steps, with the
exception of registering e820 active regions in
acpi_numa_memory_affinity_init().  This is now moved to
acpi_scan_nodes() if NUMA emulation is disabled or deferred.

acpi_numa_init() now returns a value which specifies whether an
underlying SRAT was located.  If so, that topology can be used by
the emulation code to interleave emulated nodes over physical nodes
or to register the nodes for ACPI.

acpi_get_nodes() may now be used to export the srat physical
topology of the machine for NUMA emulation.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251518580.14754@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 22:56:46 +02:00
David Rientjes
8ee2debce3 x86: Export k8 physical topology
To eventually interleave emulated nodes over physical nodes, we
need to know the physical topology of the machine without actually
registering it.  This does the k8 node setup in two parts:
detection and registration.  NUMA emulation can then used the
physical topology detected to setup the address ranges of emulated
nodes accordingly.  If emulation isn't used, the k8 nodes are
registered as normal.

Two formals are added to the x86 NUMA setup functions: `acpi' and
`k8'. These represent whether ACPI or K8 NUMA has been detected;
both cannot be true at the same time.  This specifies to the NUMA
emulation code whether an underlying physical NUMA topology exists
and which interface to use.

This patch deals solely with separating the k8 setup path into
Northbridge detection and registration steps and leaves the ACPI
changes for a subsequent patch.  The `acpi' formal is added here,
however, to avoid touching all the header files again in the next
patch.

This approach also ensures emulated nodes will not span physical
nodes so the true memory latency is not misrepresented.

k8_get_nodes() may now be used to export the k8 physical topology
of the machine for NUMA emulation.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251518400.14754@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 22:56:45 +02:00
David Rientjes
1af5ba514f x86: Clean up and add missing log levels for k8
Convert all printk's in arch/x86/mm/k8topology_64.c to use
pr_info() or pr_err() appropriately.

Adds log levels for messages currently lacking them.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251517440.14754@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 22:56:45 +02:00
Arjan van de Ven
ad8f4356af x86: Don't use the strict copy checks when branch profiling is in use
The branch profiling creates very complex code for each if
statement, to the point that gcc has trouble even analyzing
something as simple as

  if (count > 5)
      count = 5;

This then means that causing an error on code that gcc cannot
analyze for copy_from_user() and co is not very productive.

This patch excludes the strict copy checks in the case of branch
profiling being enabled.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20091006070452.5e1fc119@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 22:29:51 +02:00
H. Peter Anvin
d1705c558c x86: fix kernel panic on 32 bits when profiling
Latest kernel has a kernel panic in booting on i386 machine when
profile=2 setting in cmdline.  It is due to 'sp' being incorrect in
profile_pc().

BUG: unable to handle kernel NULL pointer dereference at 00000246
IP: [<c01288b6>] profile_pc+0x2a/0x48
*pde = 00000000
Oops: 0000 [#1] SMP

This differs from the original version by Alex Shi in that we use the
kernel_stack_pointer() inline already defined in <asm/ptrace.h> for
this purpose, instead of #ifdef.

Originally-by: Alex Shi <alex.shi@intel.com>
Cc: "Chen, Tim C" <tim.c.chen@intel.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-12 11:53:51 -07:00
Brian Gerst
ae24ffe5ec x86, 64-bit: Move K8 B step iret fixup to fault entry asm
Move the handling of truncated %rip from an iret fault to the fault
entry path.

This allows x86-64 to use the standard search_extable() function.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <1255357103-5418-1-git-send-email-brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 18:29:46 +02:00
Jan Beulich
7a4b7e5e74 x86: Fix Suspend to RAM freeze on Acer Aspire 1511Lmi laptop
Move the trampoline and accessors back out of .cpuinit.* for the
case of 64-bits+ACPI_SLEEP.

This solves s2ram hangs reported in:

  http://bugzilla.kernel.org/show_bug.cgi?id=14279

Reported-and-bisected-by: Christian Casteyde <casteyde.christian@free.fr>
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: <bugzilla-daemon@bugzilla.kernel.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 18:06:48 +02:00
David Woodhouse
9a821b2316 x86: Move pci_iommu_init to rootfs_initcall()
We want this to happen after the PCI quirks, which are now running at
the very end of the fs_initcalls.

This works around the BIOS problems which were originally addressed by
commit db8be50c43 ('USB: Work around BIOS
bugs by quiescing USB controllers earlier'), which was reverted in
commit d93a8f829f.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-10-12 14:42:11 +01:00
Arjan van de Ven
c03cb3149d x86: Relegate CONFIG_PAT and CONFIG_MTRR configurability to EMBEDDED
MTRR and PAT support (which got added to CPUs over 10 years ago)
are no longer really optional in that more and more things are
depending on PAT just working, including various drivers and newer
versions of X.  (to not even speak of MTRR)

Having this as a regular config option just no longer makes sense.

This patch relegates CONFIG_X86_PAT to the EMBEDDED category so
ultra-embedded can still disable it if they really need to.

Also-Suggested-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
LKML-Reference: <20091011103302.62bded41@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 13:06:57 +02:00
Borislav Petkov
fb2531953f mce, edac: Use an atomic notifier for MCEs decoding
Add an atomic notifier which ensures proper locking when conveying
MCE info to EDAC for decoding. The actual notifier call overrides a
default, negative priority notifier.

Note: make sure we register the default decoder only once since
mcheck_init() runs on each CPU.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20091003065752.GA8935@liondog.tnic>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 12:24:45 +02:00
Joe Perches
3c355863fb testmmiotrace.c: Add and use pr_fmt(fmt)
- Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt.
- Strip MODULE_NAME from pr_<level>s.
- Remove MODULE_NAME definition.

Signed-off-by: Joe Perches <joe@perches.com>
LKML-Reference: <3bb66cc7f85f77b9416902e1be7076f7e3f4ad48.1254701151.git.joe@perches.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 08:05:41 +02:00
Joe Perches
3bb258bf43 ftrace.c: Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
- Remove prefixes from pr_<level>, use pr_fmt(fmt).

No change in output.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <9b377eefae9e28c599dd4a17bdc81172965e9931.1254701151.git.joe@perches.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 08:05:40 +02:00
Yinghai Lu
15b812f1d0 pci: increase alignment to make more space for hidden code
As reported in

	http://bugzilla.kernel.org/show_bug.cgi?id=13940

on some system when acpi are enabled, acpi clears some BAR for some
devices without reason, and kernel will need to allocate devices for
them.  It then apparently hits some undocumented resource conflict,
resulting in non-working devices.

Try to increase alignment to get more safe range for unassigned devices.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-11 14:43:36 -07:00
Alexey Dobriyan
d43c36dc6b headers: remove sched.h from interrupt.h
After m68k's task_thread_info() doesn't refer to current,
it's possible to remove sched.h from interrupt.h and not break m68k!
Many thanks to Heiko Carstens for allowing this.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-10-11 11:20:58 -07:00
H. Peter Anvin
e634d8fc79 x86-64: merge the standard and compat start_thread() functions
The only thing left that differs between the standard and compat
start_thread functions is the actual segment numbers and the
prototype, so have a single common function which contains the guts
and two very small wrappers.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
2009-10-09 16:26:38 -07:00
H. Peter Anvin
a6f05a6a0a x86-64: make compat_start_thread() match start_thread()
For no real good reason, compat_start_thread() was embedded inline in
<asm/elf.h> whereas the native start_thread() lives in process_*.c.
Move compat_start_thread() to process_64.c, remove gratuitious
differences, and fix a few items which mostly look like bit rot.

In particular, compat_start_thread() didn't do free_thread_xstate(),
which means it was hanging on to the xstate store area even when it
was not needed.  It was also not setting old_rsp, but it looks like
that generally shouldn't matter for a 32-bit process.

Note: compat_start_thread *has* to be a macro, since it is tested with
start_thread_ia32() as the out of line function name.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
2009-10-09 16:26:38 -07:00
Joerg Roedel
c5cca146aa x86/amd-iommu: Workaround for erratum 63
There is an erratum for IOMMU hardware which documents
undefined behavior when forwarding SMI requests from
peripherals and the DTE of that peripheral has a sysmgt
value of 01b. This problem caused weird IO_PAGE_FAULTS in my
case.
This patch implements the suggested workaround for that
erratum into the AMD IOMMU driver.  The erratum is
documented with number 63.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-10-09 18:37:46 +02:00
Ingo Molnar
e7ab0f7b50 Revert "x86, timers: Check for pending timers after (device) interrupts"
This reverts commit 9bcbdd9c58.

The real bug producing LatencyTop latencies has been fixed in:

  f5dc375: sched: Update the clock of runqueue select_task_rq() selected

And the commit being reverted here triggers local timer processing
from every device IRQ. If device IRQs come in at a high frequency,
this could cause a performance regression.

The commit being reverted here purely 'fixed' the reported latency
as a side effect, because CPUs were being moved out of idle more
often.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Frans Pop <elendil@planet.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <20091008064041.67219b13@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-09 15:58:20 +02:00
Peter Zijlstra
f3834b9ef6 x86: Generate cmpxchg build failures
Rework the x86 cmpxchg() implementation to generate build failures
when used on improper types.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1254771187.21044.22.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-09 15:57:00 +02:00
Peter Zijlstra
fe9081cc9b perf, x86: Add simple group validation
Refuse to add events when the group wouldn't fit onto the PMU
anymore.

Naive implementation.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@gmail.com>
LKML-Reference: <1254911461.26976.239.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-09 15:56:14 +02:00
Stephane Eranian
b690081d4d perf_events: Add event constraints support for Intel processors
On some Intel processors, not all events can be measured in all
counters. Some events can only be measured in one particular
counter, for instance. Assigning an event to the wrong counter does
not crash the machine but this yields bogus counts, i.e., silent
error.

This patch changes the event to counter assignment logic to take
into account event constraints for Intel P6, Core and Nehalem
processors. There is no contraints on Intel Atom. There are
constraints on Intel Yonah (Core Duo) but they are not provided in
this patch given that this processor is not yet supported by
perf_events.

As a result of the constraints, it is possible for some event
groups to never actually be loaded onto the PMU if they contain two
events which can only be measured on a single counter. That
situation can be detected with the scaling information extracted
with read().

Signed-off-by: Stephane Eranian <eranian@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1254840129-6198-3-git-send-email-eranian@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-09 15:56:12 +02:00
Stephane Eranian
04a705df47 perf_events: Check for filters on fixed counter events
Intel fixed counters do not support all the filters possible with a
generic counter. Thus, if a fixed counter event is passed but with
certain filters set, then the fixed_mode_idx() function must fail
and the event must be measured in a generic counter instead.

Reject filters are: inv, edge, cnt-mask.

Signed-off-by: Stephane Eranian <eranian@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1254840129-6198-2-git-send-email-eranian@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-09 15:56:10 +02:00
John Kacur
5a943617ef x86, cpuid: Simplify the code in cpuid_open
Peter picked up my patch for tip/x86/cpu that removes the bkl in
cpuid_open. Ingo subsequently merged that into tip/master.

This patch folds back in tglx's 55968ede164ae523692f00717f50cd926f1382a0
to my patch that removed the bkl.

This simplifies the code, and makes it consistent with the changes to
kill the bkl in msr.c as well.

Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Kacur <jkacur@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-08 16:14:02 -07:00
Alok Kataria
d0153ca35d x86, vmi: Mark VMI deprecated and schedule it for removal
Add text in feature-removal.txt indicating that VMI will be removed in
the 2.6.37 timeframe.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
LKML-Reference: <1254193238.13456.48.camel@ank32.eng.vmware.com>
[ removed a bogus Kconfig change, marked (DEPRECATED) in Kconfig ]
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-08 22:27:55 +02:00
Linus Torvalds
624235c5b3 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, pci: Correct spelling in a comment
  x86: Simplify bound checks in the MTRR code
  x86: EDAC: carve out AMD MCE decoding logic
  initcalls: Add early_initcall() for modules
  x86: EDAC: MCE: Fix MCE decoding callback logic
2009-10-08 12:06:36 -07:00
Arjan van de Ven
9bcbdd9c58 x86, timers: Check for pending timers after (device) interrupts
Now that range timers and deferred timers are common, I found a
problem with these using the "perf timechart" tool. Frans Pop also
reported high scheduler latencies via LatencyTop, when using
iwlagn.

It turns out that on x86, these two 'opportunistic' timers only get
checked when another "real" timer happens. These opportunistic
timers have the objective to save power by hitchhiking on other
wakeups, as to avoid CPU wakeups by themselves as much as possible.

The change in this patch runs this check not only at timer
interrupts, but at all (device) interrupts. The effect is that:

 1) the deferred timers/range timers get delayed less

 2) the range timers cause less wakeups by themselves because
    the percentage of hitchhiking on existing wakeup events goes up.

I've verified the working of the patch using "perf timechart", the
original exposed bug is gone with this patch. Frans also reported
success - the latencies are now down in the expected ~10 msec
range.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Tested-by: Frans Pop <elendil@planet.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091008064041.67219b13@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-08 17:27:27 +02:00
John Kacur
170a0bc380 x86, cpuid: Remove the bkl from cpuid_open()
Most of the variables are local to the function. It IS possible that
for struct cpuinfo_x86 *c c could point to the same area. However,
this is used read only.

Signed-off-by: John Kacur <jkacur@redhat.com>
LKML-Reference: <alpine.LFD.2.00.0910072016190.15183@localhost.localdomain>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-07 15:41:21 -07:00
Frederic Weisbecker
d6c304055b x86, msr: Remove the bkl from msr_open()
Remove the big kernel lock from msr_open() as it doesn't protect
anything there.

The only racy event that can happen here is a concurrent cpu shutdown.

So let's look at what could be racy during/after the above event:

- The cpu_online() check is racy, but the bkl doesn't help about
  that anyway it disables preemption but we may be chcking another
  cpu than the current one.
  Also the cpu can still become offlined between open and read calls.

- The cpu_data(cpu) returns a safe pointer too. It won't be released on
  cpu offlining. But some fields can be changed from
  arch/x86/kernel/smpboot.c:remove_siblinginfo() :

	- phys_proc_id
	- cpu_core_id

  Those are not read from msr_open(). What we are checking is the
  x86_capability that is left untouched on offlining.

So this removal looks safe.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sven-Thorsten Dietrich <sdietrich@suse.de>
LKML-Reference: <1254944602-7382-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-07 13:47:19 -07:00
Linus Torvalds
19d031e052 Merge branch 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: add support for change_pte mmu notifiers
  KVM: MMU: add SPTE_HOST_WRITEABLE flag to the shadow ptes
  KVM: MMU: dont hold pagecount reference for mapped sptes pages
  KVM: Prevent overflow in KVM_GET_SUPPORTED_CPUID
  KVM: VMX: flush TLB with INVEPT on cpu migration
  KVM: fix LAPIC timer period overflow
  KVM: s390: fix memsize >= 4G
  KVM: SVM: Handle tsc in svm_get_msr/svm_set_msr correctly
  KVM: SVM: Fix tsc offset adjustment when running nested
2009-10-05 12:07:39 -07:00
Linus Torvalds
46302b46e5 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Don't leak 64-bit kernel register values to 32-bit processes
  x86, SLUB: Remove unused CONFIG FAST_CMPXCHG_LOCAL
  x86: earlyprintk: Fix regression to handle serial,ttySn as 1 arg
  x86: Don't generate cmpxchg8b_emu if CONFIG_X86_CMPXCHG64=y
  x86: Fix csum_ipv6_magic asm memory clobber
  x86: Optimize cmpxchg64() at build-time some more
2009-10-05 12:02:18 -07:00
Izik Eidus
3da0dd433d KVM: add support for change_pte mmu notifiers
this is needed for kvm if it want ksm to directly map pages into its
shadow page tables.

[marcelo: cast pfn assignment to u64]

Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-10-04 17:04:53 +02:00
Izik Eidus
1403283acc KVM: MMU: add SPTE_HOST_WRITEABLE flag to the shadow ptes
this flag notify that the host physical page we are pointing to from
the spte is write protected, and therefore we cant change its access
to be write unless we run get_user_pages(write = 1).

(this is needed for change_pte support in kvm)

Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-10-04 17:04:50 +02:00
Izik Eidus
acb66dd051 KVM: MMU: dont hold pagecount reference for mapped sptes pages
When using mmu notifiers, we are allowed to remove the page count
reference tooken by get_user_pages to a specific page that is mapped
inside the shadow page tables.

This is needed so we can balance the pagecount against mapcount
checking.

(Right now kvm increase the pagecount and does not increase the
mapcount when mapping page into shadow page table entry,
so when comparing pagecount against mapcount, you have no
reliable result.)

Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-10-04 17:04:48 +02:00
Avi Kivity
6a54435560 KVM: Prevent overflow in KVM_GET_SUPPORTED_CPUID
The number of entries is multiplied by the entry size, which can
overflow on 32-bit hosts.  Bound the entry count instead.

Reported-by: David Wagner <daw@cs.berkeley.edu>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-10-04 17:04:16 +02:00
Marcelo Tosatti
eb5109e311 KVM: VMX: flush TLB with INVEPT on cpu migration
It is possible that stale EPTP-tagged mappings are used, if a
vcpu migrates to a different pcpu.

Set KVM_REQ_TLB_FLUSH in vmx_vcpu_load, when switching pcpus, which
will invalidate both VPID and EPT mappings on the next vm-entry.

Cc: stable@kernel.org
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-10-04 13:57:24 +02:00
Aurelien Jarno
b2d83cfa3f KVM: fix LAPIC timer period overflow
Don't overflow when computing the 64-bit period from 32-bit registers.

Fixes sourceforge bug #2826486.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Cc: stable@kernel.org
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-10-04 13:57:23 +02:00
Joerg Roedel
20824f30bb KVM: SVM: Handle tsc in svm_get_msr/svm_set_msr correctly
When running nested we need to touch the l1 guests
tsc_offset. Otherwise changes will be lost or a wrong value
be read.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-10-04 13:57:23 +02:00
Joerg Roedel
77b1ab1732 KVM: SVM: Fix tsc offset adjustment when running nested
When svm_vcpu_load is called while the vcpu is running in
guest mode the tsc adjustment made there is lost on the next
emulated #vmexit. This causes the tsc running backwards in
the guest. This patch fixes the issue by also adjusting the
tsc_offset in the emulated hsave area so that it will not
get lost.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-10-04 13:57:22 +02:00
Marin Mitov
e3be785fb5 x86, pci: Correct spelling in a comment
Signed-off-by: Marin Mitov <mitov@issp.bas.bg>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
LKML-Reference: <200910032045.02523.mitov@issp.bas.bg>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
======================================================
2009-10-03 20:35:16 +02:00
Masami Hiramatsu
c0b11d3af1 x86: Add VIA processor instructions in opcodes decoder
Add VIA processor's Padlock instructions(MONTMUL, XSHA1, XSHA256)
as parts of the kernel may use them.

This fixes the following crash in opcodes decoder selftests:

 make[2]: `scripts/unifdef' is up to date.
   TEST    posttest
 Error: c145cf71:        f3 0f a6 d0             repz xsha256
 Error: objdump says 4 bytes, but insn_get_length() says 3 (attr:0)
 make[1]: *** [posttest] Error 2
 make: *** [bzImage] Error 2

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <20090925182037.10157.3180.stgit@omoto>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-10-03 02:43:00 +02:00
Matteo Croce
98059e3463 x86: AMD Geode LX optimizations
Add CPU optimizations for AMD Geode LX.

Signed-off-by: Matteo Croce <technoboy85@gmail.com>
LKML-Reference: <40101cc30910010811v5d15ff4cx9dd57c9cc9b4b045@mail.gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-02 11:24:28 -07:00
Arjan van de Ven
11879ba5d9 x86: Simplify bound checks in the MTRR code
The current bound checks for copy_from_user in the MTRR driver are
not as obvious as they could be, and gcc agrees with that.

This patch simplifies the boundary checks to the point that gcc can
now prove to itself that the copy_from_user() is never going past
its bounds.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20090926205150.30797709@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-02 19:51:56 +02:00
Arjan van de Ven
63312b6a6f x86: Add a Kconfig option to turn the copy_from_user warnings into errors
For automated testing it is useful to have the option to turn
the warnings on copy_from_user() etc checks into errors:

 In function ‘copy_from_user’,
     inlined from ‘fd_copyin’ at drivers/block/floppy.c:3080,
     inlined from ‘fd_ioctl’ at drivers/block/floppy.c:3503:
   linux/arch/x86/include/asm/uaccess_32.h:213:
  error: call to ‘copy_from_user_overflow’ declared with attribute error:
  copy_from_user buffer size is not provably correct

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20091002075050.4e9f7641@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-02 19:01:42 +02:00
Ingo Molnar
f436f8bb73 x86: EDAC: MCE: Fix MCE decoding callback logic
Make decoding of MCEs happen only on AMD hardware by registering a
non-default callback only on CPU families which support it.

While looking at the interaction of decode_mce() with the other MCE
code i also noticed a few other things and made the following
cleanups/fixes:

 - Fixed the mce_decode() weak alias - a weak alias is really not
   good here, it should be a proper callback. A weak alias will be
   overriden if a piece of code is built into the kernel - not
   good, obviously.

 - The patch initializes the callback on AMD family 10h and 11h.

 - Added the more correct fallback printk of:

	No support for human readable MCE decoding on this CPU type.
	Transcribe the message and run it through 'mcelog --ascii' to decode.

   On CPUs that dont have a decoder.

 - Made the surrounding code more readable.

Note that the callback allows us to have a default fallback -
without having to check the CPU versions during the printout
itself. When an EDAC module registers itself, it can install the
decode-print function.

(there's no unregister needed as this is core code.)

version -v2 by Borislav Petkov:

 - add K8 to the set of supported CPUs

 - always build in edac_mce_amd since we use an early_initcall now

 - fix checkpatch warnings

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <20091001141432.GA11410@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-02 15:42:18 +02:00
Samuel Thibault
392d814daf x86: fix csum_ipv6_magic asm memory clobber
Just like ip_fast_csum, the assembly snippet in csum_ipv6_magic needs a
memory clobber, as it is only passed the address of the buffer, not a
memory reference to the buffer itself.

This caused failures in Hurd's pfinetv4 when we tried to compile it with
gcc-4.3 (bogus checksums).

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01 16:11:12 -07:00