GUEST_CR3 is updated via kvm_set_cr3 whenever CR3 is modified from
outside guest context. Similarly pdptrs are updated via load_pdptrs.
Let kvm_set_cr3 perform the update, removing it from the vcpu_run
fast path.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Acked-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Instead of reloading syscall MSRs on every preemption, use the new shared
msr infrastructure to reload them at the last possible minute (just before
exit to userspace).
Improves vcpu/idle/vcpu switches by about 2000 cycles (when EFER needs to be
reloaded as well).
[jan: fix slot index missing indirection]
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The various syscall-related MSRs are fairly expensive to switch. Currently
we switch them on every vcpu preemption, which is far too often:
- if we're switching to a kernel thread (idle task, threaded interrupt,
kernel-mode virtio server (vhost-net), for example) and back, then
there's no need to switch those MSRs since kernel threasd won't
be exiting to userspace.
- if we're switching to another guest running an identical OS, most likely
those MSRs will have the same value, so there's little point in reloading
them.
- if we're running the same OS on the guest and host, the MSRs will have
identical values and reloading is unnecessary.
This patch uses the new user return notifiers to implement last-minute
switching, and checks the msr values to avoid unnecessary reloading.
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently MSR_KERNEL_GS_BASE is saved and restored as part of the
guest/host msr reloading. Since we wish to lazy-restore all the other
msrs, save and reload MSR_KERNEL_GS_BASE explicitly instead of using
the common code.
Signed-off-by: Avi Kivity <avi@redhat.com>
The svm_set_cr0() call will initialize save->cr0 properly even when npt is
enabled, clearing the NW and CD bits as expected, so we don't need to
initialize it manually for npt_enabled anymore.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
svm_vcpu_reset() was not properly resetting the contents of the guest-visible
cr0 register, causing the following issue:
https://bugzilla.redhat.com/show_bug.cgi?id=525699
Without resetting cr0 properly, the vcpu was running the SIPI bootstrap routine
with paging enabled, making the vcpu get a pagefault exception while trying to
run it.
Instead of setting vmcb->save.cr0 directly, the new code just resets
kvm->arch.cr0 and calls kvm_set_cr0(). The bits that were set/cleared on
vmcb->save.cr0 (PG, WP, !CD, !NW) will be set properly by svm_set_cr0().
kvm_set_cr0() is used instead of calling svm_set_cr0() directly to make sure
kvm_mmu_reset_context() is called to reset the mmu to nonpaging mode.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This should have no effect, it is just to make the code clearer.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.
Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.
This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.
[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Commit 705c5323 opened the doors of hell by unconditionally injecting
single-step flags as long as guest_debug signaled this. This doesn't
work when the guest branches into some interrupt or exception handler
and triggers a vmexit with flag reloading.
Fix it by saving cs:rip when user space requests single-stepping and
restricting the trace flag injection to this guest code position.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Support for Xen PV-on-HVM guests can be implemented almost entirely in
userspace, except for handling one annoying MSR that maps a Xen
hypercall blob into guest address space.
A generic mechanism to delegate MSR writes to userspace seems overkill
and risks encouraging similar MSR abuse in the future. Thus this patch
adds special support for the Xen HVM MSR.
I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
KVM which MSR the guest will write to, as well as the starting address
and size of the hypercall blobs (one each for 32-bit and 64-bit) that
userspace has loaded from files. When the guest writes to the MSR, KVM
copies one page of the blob from userspace to the guest.
I've tested this patch with a hacked-up version of Gerd's userspace
code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.
[jan: fix i386 build warning]
[avi: future proof abi with a flags field]
Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This (broken) check dates back to the days when this code was shared
across architectures. x86 has IOMEM, so drop it.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If cpufreq can't determine the CPU khz, or cpufreq is not compiled in,
we should fallback to the measured TSC khz.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
New AMD processors (Family 0x10 models 8+) support the Pause
Filter Feature. This feature creates a new field in the VMCB
called Pause Filter Count. If Pause Filter Count is greater
than 0 and intercepting PAUSEs is enabled, the processor will
increment an internal counter when a PAUSE instruction occurs
instead of intercepting. When the internal counter reaches the
Pause Filter Count value, a PAUSE intercept will occur.
This feature can be used to detect contended spinlocks,
especially when the lock holding VCPU is not scheduled.
Rescheduling another VCPU prevents the VCPU seeking the
lock from wasting its quantum by spinning idly.
Experimental results show that most spinlocks are held
for less than 1000 PAUSE cycles or more than a few
thousand. Default the Pause Filter Counter to 3000 to
detect the contended spinlocks.
Processor support for this feature is indicated by a CPUID
bit.
On a 24 core system running 4 guests each with 16 VCPUs,
this patch improved overall performance of each guest's
32 job kernbench by approximately 3-5% when combined
with a scheduler algorithm thati caused the VCPU to
sleep for a brief period. Further performance improvement
may be possible with a more sophisticated yield algorithm.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
control fields:
PLE_Gap - upper bound on the amount of time between two successive
executions of PAUSE in a loop.
PLE_Window - upper bound on the amount of time a guest is allowed to execute in
a PAUSE loop
If the time, between this execution of PAUSE and previous one, exceeds the
PLE_Gap, processor consider this PAUSE belongs to a new loop.
Otherwise, processor determins the the total execution time of this loop(since
1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
PLE_Window.
* Refer SDM volume 3b section 21.6.13 & 22.1.3.
Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
is sched-out after hold a spinlock, then other VPs for same lock are sched-in
to waste the CPU time.
Our tests indicate that most spinlocks are held for less than 212 cycles.
Performance tests show that with 2X LP over-commitment we can get +2% perf
improvement for kernel build(Even more perf gain with more LPs).
Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
With all important informations now delivered through
tracepoints we can savely remove the nsvm_printk debugging
code for nested svm.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch adds a tracepoint for the event that the guest
executed the SKINIT instruction. This information is
important because SKINIT is an SVM extenstion not yet
implemented by nested SVM and we may need this information
for debugging hypervisors that do not yet run on nested SVM.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch adds a tracepoint for the event that the guest
executed the INVLPGA instruction.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch adds a special tracepoint for the event that a
nested #vmexit is injected because kvm wants to inject an
interrupt into the guest.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch adds a tracepoint for a nested #vmexit that gets
re-injected to the guest.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch adds a tracepoint for every #vmexit we get from a
nested guest.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch adds a dedicated kvm tracepoint for a nested
vmrun.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The nested SVM code emulates a #vmexit caused by a request
to open the irq window right in the request function. This
is a bug because the request function runs with preemption
and interrupts disabled but the #vmexit emulation might
sleep. This can cause a schedule()-while-atomic bug and is
fixed with this patch.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If event_inj is valid on a #vmexit the host CPU would write
the contents to exit_int_info, so the hypervisor knows that
the event wasn't injected.
We don't do this in nested SVM by now which is a bug and
fixed by this patch.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
For a while now, we are issuing a rdmsr instruction to find out which
msrs in our save list are really supported by the underlying machine.
However, it fails to account for kvm-specific msrs, such as the pvclock
ones.
This patch moves then to the beginning of the list, and skip testing them.
Cc: stable@kernel.org
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Push TF and RF injection and filtering on guest single-stepping into the
vender get/set_rflags callbacks. This makes the whole mechanism more
robust wrt user space IOCTL order and instruction emulations.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Disable paravirt MMU capability reporting, so that new (or rebooted)
guests switch to native operation.
Paravirt MMU is a burden to maintain and does not bring significant
advantages compared to shadow anymore.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Much of so far vendor-specific code for setting up guest debug can
actually be handled by the generic code. This also fixes a minor deficit
in the SVM part /wrt processing KVM_GUESTDBG_ENABLE.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Both VMX and SVM require per-cpu memory allocation, which is done at module
init time, for only online cpus.
Backend was not allocating enough structure for all possible CPUs, so
new CPUs coming online could not be hardware enabled.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
They are globals, not clearly protected by any ordering or locking, and
vulnerable to various startup races.
Instead, for variable TSC machines, register the cpufreq notifier and get
the TSC frequency directly from the cpufreq machinery. Not only is it
always right, it is also perfectly accurate, as no error prone measurement
is required.
On such machines, when a new CPU online is brought online, it isn't clear what
frequency it will start with, and it may not correspond to the reference, thus
in hardware_enable we clear the cpu_tsc_khz variable to zero and make sure
it is set before running on a VCPU.
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch replaces them with native_read_tsc() which can
also be used in expressions and saves a variable on the
stack in this case.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The exit_int_info field is only written by the hardware and
never read. So it does not need to be copied on a vmrun
emulation.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch reorganizes the logic in svm_interrupt_allowed to
make it better to read. This is important because the logic
is a lot more complicated with Nested SVM.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).
Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.
To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.
So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
nested_svm_map unnecessarily takes mmap_sem around gfn_to_page, since
gfn_to_page / get_user_pages are responsible for it.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
- Change returned handle_invalid_guest_state() to return relevant exit codes
- Move triggering the emulation from vmx_vcpu_run() to vmx_handle_exit()
- Return to userspace instead of repeatedly trying to emulate instructions that have already failed
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This adds pusha and popa instructions (opcodes 0x60-0x61), this enables booting
MINIX with invalid guest state emulation on.
[marcelo: remove unused variable]
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Add missing decoder flags for or instructions (0xc-0xd).
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The only thing it protects now is interrupt injection into lapic and
this can work lockless. Even now with kvm->irq_lock in place access
to lapic is not entirely serialized since vcpu access doesn't take
kvm->irq_lock.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Maintain back mapping from irqchip/pin to gsi to speedup
interrupt acknowledgment notifications.
[avi: build fix on non-x86/ia64]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This removes assumptions that max GSIs is smaller than number of pins.
Sharing is tracked on pin level not GSI level.
[avi: no PIC on ia64]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Introduces a new decode option "No64", which is used for instructions that are
invalid in long mode.
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Even it is in error path unlikely taken, add_timer_on() at
CPU_DOWN_FAILED* needs to be skipped if mce_timer is disabled.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: <stable@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This at once also gets the alignment specification right for
x86-64.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4B0FF8F80200007800022708@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Having run into the run-(boot-)time check a couple of times lately,
I finally took time to find a build-time check so that one doesn't
need to analyze the register/stack dump and resolve this (through
manual lookup in vmlinux) to the offending construct.
The assembler will emit a message like "Error: value of <num> too
large for field of 1 bytes at <offset>", which while not pointing
out the source location still makes analysis quite a bit easier.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4B0FF8AA0200007800022703@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fixup_irqs() already has a mdelay(). Remove the extra and
unnecessary mdelay() from cpu_disable_common().
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: ebiederm@xmission.com
Cc: garyhade@us.ibm.com
LKML-Reference: <20091201233335.232177348@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In the case when cpu goes offline, fixup_irqs() will forward any
unhandled interrupt on the offlined cpu to the new cpu
destination that is handling the corresponding interrupt. This
interrupt forwarding is done via IPI's. Hence, in this case also
level-triggered io-apic interrupt will be seen as an edge
interrupt in the cpu's APIC IRR.
Document this scenario in the code which handles this case by doing
an explicit EOI to the io-apic to clear remote IRR of the io-apic RTE.
Requested-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: ebiederm@xmission.com
Cc: garyhade@us.ibm.com
LKML-Reference: <20091201233335.143970505@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Maciej W. Rozycki reported:
> 82093AA I/O APIC has its version set to 0x11 and it
> does not support the EOI register. Similarly I/O APICs
> integrated into the 82379AB south bridge and the 82374EB/SB
> EISA component.
IO-APIC versions below 0x20 don't support EOI register.
Some of the Intel ICH Specs (ICH2 to ICH5) documents the io-apic
version as 0x2. This is an error with documentation and these
ICH chips use io-apic's of version 0x20 and indeed has a working
EOI register for the io-apic.
Fix the EOI register detection mechanism to check for version
0x20 and beyond.
And also, a platform can potentially have io-apic's with
different versions. Make the EOI register check per io-apic.
Reported-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: ebiederm@xmission.com
Cc: garyhade@us.ibm.com
LKML-Reference: <20091201233335.065361533@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When the level-triggered interrupt is seen as an edge interrupt,
we try to clear the remoteIRR explicitly (using either an
io-apic eoi register when present or through the idea of
changing trigger mode of the io-apic RTE to edge and then back
to level). But this explicit try also needs to happen before we
try to migrate the irq. Otherwise irq migration attempt will
fail anyhow, as it postpones the irq migration to a later
attempt when it sees the remoteIRR in the io-apic RTE still set.
Signed-off-by: "Maciej W. Rozycki" <macro@linux-mips.org>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: ebiederm@xmission.com
Cc: garyhade@us.ibm.com
LKML-Reference: <20091201233334.975416130@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When we disable a breakpoint through dr7, we unregister it right
away, making us lose track of its corresponding address
register value.
It means that the following sequence would be unsupported:
- set address in dr0
- enable it through dr7
- disable it through dr7
- enable it through dr7
because we lost the address register value when we disabled the
breakpoint.
Don't unregister the disabled breakpoints but rather disable
them.
Reported-by: "K.Prasad" <prasad@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1259735536-9236-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The semantics the PAT code expect of is_untracked_pat_range() is "is
this range completely contained inside the untracked region." This
means that checkin 8a27138924 was
technically wrong, because the implementation needlessly confusing.
The sane interface is for it to take a semiclosed range like just
about everything else (as evidenced by the sheer number of "- 1"'s
removed by that patch) so change the actual implementation to match.
Reported-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>
copy_edd() should be __init.
warning msg:
WARNING: vmlinux.o(.text+0x7759): Section mismatch in reference from the
function copy_edd() to the variable .init.data:boot_params
The function copy_edd() references
the variable __initdata boot_params.
This is often because copy_edd lacks a __initdata
annotation or the annotation of boot_params is wrong.
Signed-off-by: ZhenwenXu <helight.xu@gmail.com>
LKML-Reference: <4B139F8F.4000907@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
commit 746357d (x86: Prevent GCC 4.4.x (pentium-mmx et al) function
prologue wreckage) uses -mtune=generic to work around the function
prologue problem with mcount on -march=pentium-mmx and others.
Jakub pointed out that we can use -maccumulate-outgoing-args instead
which is selected by -mtune=generic and prevents the problem without
losing the -march specific optimizations.
Pointed-out-by: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@kernel.org
Andrew complained rightly that the WARN_ON in hpet_next_event() is
confusing and the code comment not really helpful.
Change it to WARN_ONCE and print the reason in clear text. Change the
comment to explain what kind of hardware wreckage we deal with.
Pointed-out-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com>
The data that was stored in this table is now available in
dev->archdata.iommu. So this table is not longer necessary.
This patch removes the remaining uses of that variable and
removes it from the code.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch removes the ugly contruct where the
iommu->lock must be released while before calling the
reset_iommu_command_buffer function.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch cleans up the code to flush device table entries
in the IOMMU. With this chance the driver can get rid of the
iommu_queue_inv_dev_entry() function.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch adds a function to flush a DTE entry for a given
struct device and replaces iommu_queue_inv_dev_entry calls
with this function where appropriate.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch cleans up the attach_device and detach_device
paths and fixes reference counting while at it.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch introduces a list to each protection domain which
keeps all devices associated with the domain. This can be
used later to optimize certain functions and to completly
remove the amd_iommu_pd_table.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch adds a reference count to each device to count
how often the device was bound to that domain. This is
important for single devices that act as an alias for a
number of others. These devices must stay bound to their
domains until all devices that alias to it are unbound from
the same domain.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch changes IOMMU code to use dev->archdata->iommu to
store information about the alias device and the domain the
device is attached to.
This allows the driver to get rid of the amd_iommu_pd_table
in the future.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch makes device isolation mandatory and removes
support for the amd_iommu=share option. This simplifies the
code in several places.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch rearranges two dma_ops related functions so that
their forward declarations are not longer necessary.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch moves alloc_pte() and fetch_pte() into the page
table handling code section so that the forward declarations
for them could be removed.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The logic of these two functions is reimplemented (at least
in parts) in places in the code. This patch removes these
code duplications and uses the functions instead. As a side
effect it moves check_device() to the helper function code
section.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This is a helper function and when its placed in the helper
function section we can remove its forward declaration.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
With the previous changes the get_device_resources function
can be simplified even more. The only important information
for the callers is the protection domain.
This patch renames the function to get_domain() and let it
only return the protection domain for a device.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
If there is no domain associated to a device yet and the
device has an alias device which already has a domain, the
original device needs to have the same domain as the alias
device.
This patch changes domain_for_device to handle this
situation and directly assigns the alias device domain to
the device in this situation.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
With the prior changes this parameter is not longer
required. This patch removes it from the function and all
callers.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Since the assumption that an dma_ops domain is only bound to
one IOMMU was given up we need to make alloc_new_range aware
of it.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Every call-place of get_device_resources calls check_device
before it. So call it from get_device_resources directly and
simplify the code.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The check_device logic needs to include the dma_supported
checks to be really sure. Merge the dma_supported logic into
check_device and use it to implement dma_supported.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The non-present cache flag was IOMMU local until now which
doesn't make sense. Make this a global flag so we can remove
the lase user of 'struct iommu' in the map/unmap path.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch reimplements the function
flush_all_domains_on_iommu to use the global protection
domain list.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch reimplementes the amd_iommu_flush_all_domains
function to use the global protection domain list instead
of flushing every domain on every IOMMU.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch adds code to keep a global list of all protection
domains. This allows to simplify the resume code.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This iommu_flush_tlb_pde function does essentially the same.
So the iommu_flush_domain function is redundant and can be
removed.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch extends the iommu_flush_pages function to flush
the TLB entries on all IOMMUs the domain has devices on.
This basically gives up the former assumption that dma_ops
domains are only bound to one IOMMU in the system.
For dma_ops domains this is still true but not for
IOMMU-API managed domains. Giving this assumption up for
dma_ops domains too allows code simplification.
Further it splits out the main logic into a generic function
which can be used by iommu_flush_tlb too.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch adds a function to the AMD IOMMU driver which
completes all queued commands an all IOMMUs a specific
domain has devices attached on. This is required in a later
patch when per-domain flushing is implemented.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch adds reference counting for protection domains
per IOMMU. This allows a smarter TLB flushing strategy.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch adds an index field to struct amd_iommu which can
be used to lookup it up in an array. This index will be used
in struct protection_domain to keep track which protection
domain has devices behind which IOMMU.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch updates the copyright headers in the relevant AMD
IOMMU driver files to match the date of the latest changes.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch moves all function declarations which are only
used inside the driver code to a seperate header file.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
In-kernel user breakpoints are created using functions in which
we pass breakpoint parameters as individual variables: address,
length and type.
Although it fits well for x86, this just does not scale across
archictectures that may support this api later as these may have
more or different needs. Pass in a perf_event_attr structure
instead because it is meant to evolve as much as possible into
a generic hardware breakpoint parameter structure.
Reported-by: K.Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1259294154-5197-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If pat is disabled (boot with nopat), there's no need to create
debugfs for it, it's empty all the time.
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1259236428-16329-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Explicitly mmap the UV chipset MMR address ranges used to
access blade-local registers. Although these same MMRs are also
mmaped at higher addresses, the low range is more
convenient when accessing blade-local registers.
The low range addresses always alias to the local blade
regardless of the blade id.
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20091125162018.GA25445@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This prevents kernel threads from inheriting non-null segment
selectors, and causing optimizations in __switch_to() to be
ineffective.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Tim Blechmann <tim@klingt.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jan Beulich <JBeulich@novell.com>
LKML-Reference: <1259165856-3512-1-git-send-email-brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make it readable in the source too, not just in the assembly output.
No change in functionality.
Cc: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1259176706-5908-1-git-send-email-brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Zero the input register in the exception handler instead of
using an extra register to pass in a zero value.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1259176706-5908-1-git-send-email-brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The mce_disable_cpu() and mce_reenable_cpu() are called only
from mce_cpu_callback() which is marked as __cpuinit.
So these functions can be __cpuinit too.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <4B0E3C4E.4090809@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Limit the number of per cpu TSC sync messages by only printing
to the console if an error occurs, otherwise print as a DEBUG
message.
The info message "Skipping synchronization ..." is only printed
after the last cpu has booted.
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091118002222.181053000@alcatraz.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When we schedule out a breakpoint from the cpu, we also
incidentally remove the "Global exact breakpoint" flag from the
breakpoint control register. It makes us losing the fine grained
precision about the origin of the instructions that may trigger
breakpoint exceptions for the other breakpoints running in this
cpu.
Reported-by: Prasad <prasad@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1259211878-6013-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This simplifies the error handling when we create a breakpoint.
We don't need to check the NULL return value corner case anymore
since we have improved perf_event_create_kernel_counter() to
always return an error code in the failure case.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
LKML-Reference: <1259210142-5714-3-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mtdblock driver doesn't call flush_dcache_page for pages in request. So,
this causes problems on architectures where the icache doesn't fill from
the dcache or with dcache aliases. The patch fixes this.
The ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE symbol was introduced to avoid
pointless empty cache-thrashing loops on architectures for which
flush_dcache_page() is a no-op. Every architecture was provided with this
flush pages on architectires where ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE is
equal 1 or do nothing otherwise.
See "fix mtd_blkdevs problem with caches on some architectures" discussion
on LKML for more information.
Signed-off-by: Ilya Loginov <isloginov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Peter Horton <phorton@bitbox.co.uk>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Make the initialization more readable, plus tidy up a few small
visual details as well.
No change in functionality.
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Percpu symbols now occupy the same namespace as other global
symbols and as such short global symbols without subsystem
prefix tend to collide with local variables. dr7 percpu
variable used by x86 was hit by this. Rename it to cpu_dr7.
The rename also makes it more consistent with its fellow
cpu_debugreg percpu variable.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20091125115856.GA17856@elte.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
iommu=soft boot option forces the kernel to use swiotlb.
( This has the side-effect of enabling the swiotlb over the
GART if this boot option is provided. This is the desired
behavior of the swiotlb boot option and works like that
for all other hw-IOMMU drivers. )
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: yinghai@kernel.org
LKML-Reference: <20091125084611O.fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The existing interface only has a pre-order callback. This change
adds an additional parameter for a post-order callback which will
be more useful for bus scans. ACPICA BZ 779.
Also update the external calls to acpi_walk_namespace.
http://www.acpica.org/bugzilla/show_bug.cgi?id=779
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This patch factors out the search for an MMCONFIG region, which was
previously implemented in both mmconfig_32 and mmconfig_64. No functional
change.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
No functional change; just tidy up printks and make them more consistent
with the rest of PCI.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This is only used internally now, but eventually will be used in the
hot-remove path to remove the MMCONFIG region associated with a host bridge.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This changes pci_mmcfg_region from a table to a list, to make it easier
to add and remove MMCONFIG regions for PCI host bridge hotplug.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This replaces "typeof(pci_mmcfg_config[0])" with the actual type because
I plan to convert pci_mmcfg_config to a list, and then "pci_mmcfg_config[0]"
won't mean anything.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The virtual address is only used for x86_64, but it's so much simpler
to manage it as part of the pci_mmcfg_region that I think it's worth
wasting a pointer per MMCONFIG region on x86_32.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Since pci_mmcfg_region contains the struct resource, no need to pass the
pci_mmcfg_region *and* the resource start/size.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch adds a resource and corresponding name to the MMCONFIG
structure. This makes allocation simpler (we can allocate the
resource and name at the same time we allocate the pci_mmcfg_region),
and gives us a way to hang onto the resource after inserting it.
This will be needed so we can release and free it when hot-removing
a host bridge.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
No functional change, but simplifies a future patch to convert the table
to a list.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This only renames the struct pci_mmcfg_region members; no functional change.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This adds a struct pci_mmcfg_region with a little more information
than the struct acpi_mcfg_allocation used previously. The acpi_mcfg
structure is defined by the spec, so we can't change it.
To begin with, struct pci_mmcfg_region is basically the same as the
ACPI MCFG version, but future patches will add more information.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This factors out the common "bus << 20" expression used when computing the
MMCONFIG address.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Since all MMCONFIG regions go through pci_mmconfig_add(), we can test the
address once there. If the caller supplies an address of zero, we never
insert it in the pci_mmcfg_config[] table, so no need to test it elsewhere.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We never set pci_mmcfg_config unless we increment pci_mmcfg_config_num,
so there's no need to test both pci_mmcfg_config_num and pci_mmcfg_config.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch encapsulate pci_mmcfg_config[] updates. All alloc/free is now
done in pci_mmconfig_add() and free_all_mcfg(), so all updates to
pci_mmcfg_config[] and pci_mmcfg_config_num are in those two functions.
This replaces the previous sequence of extend_mmcfg(), fill_one_mmcfg()
with the single pci_mmconfig_add() interface. This interface is currently
static but will eventually be used in the host bridge hot-add path.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Step through the ACPI MCFG table, not pci_mmcfg_config[]. No functional
change, but simplifies future patches that encapsulate pci_mmcfg_config[].
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use a local variable, not pci_mmcfg_config_num, to count MCFG entries.
No functional change, but simplifies future changes.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Those functions are used by intel_bus.c so seperate them to another file. and
make amd_bus a bit smaller.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
commit db635adc turned -DDEBUG for x86/pci on when CONFIG_PCI_DEBUG
is set. In general, I agree with that change.
However, it exposes a bunch of very low level PCI debugging in the
early x86 path, such as:
0 reading 2 from a: ffff
1 reading 2 from a: ffff
2 reading 2 from a: ffff
3 reading 2 from a: 300
3 reading 2 from 0: 1002
3 reading 2 from 2: 515e
These statements add a lot of noise to the boot and aren't likely to
be necessary even when handling random upstream bug reports.
[In contrast, statements such as these:
pci 0000:02:04.0: found [14e4:164a] class 000200 header type 00
pci 0000:02:04.0: reg 10: [mem 0xf8000000-0xf9ffffff 64bit]
pci 0000:02:04.0: reg 30: [mem 0x00000000-0x0001ffff pref]
are indeed useful when remote debugging users' machines]
Remove the noisy printks and save electrons everywhere.
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
In some cases we can coalesce MTRR entries after cleanup; this may
allow us to have more entries. As such, introduce clean_sort_range to
to sort and coaelsce the MTRR entries.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B0BB9A3.5020908@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This interface is mainly intended (and implemented) for ACPI _PPC BIOS
frequency limitations, but other cpufreq drivers can also use it for
similar use-cases.
Why is this needed:
Currently it's not obvious why cpufreq got limited.
People see cpufreq/scaling_max_freq reduced, but this could have
happened by:
- any userspace prog writing to scaling_max_freq
- thermal limitations
- hardware (_PPC in ACPI case) limitiations
Therefore export bios_limit (in kHz) to:
- Point the user that it's the BIOS (broken or intended) which limits
frequency
- Export it as a sysfs interface for userspace progs.
While this was a rarely used feature on laptops, there will appear
more and more server implemenations providing "Green IT" features like
allowing the service processor to limit the frequency. People want
to know about HW/BIOS frequency limitations.
All ACPI P-state driven cpufreq drivers are covered with this patch:
- powernow-k8
- powernow-k7
- acpi-cpufreq
Tested with a patched DSDT which limits the first two cores (_PPC returns 1)
via _PPC, exposed by bios_limit:
# echo 2200000 >cpu2/cpufreq/scaling_max_freq
# cat cpu*/cpufreq/scaling_max_freq
2600000
2600000
2200000
2200000
# #scaling_max_freq shows general user/thermal/BIOS limitations
# cat cpu*/cpufreq/bios_limit
2600000
2600000
2800000
2800000
# #bios_limit only shows the HW/BIOS limitation
CC: Pallipadi Venkatesh <venkatesh.pallipadi@intel.com>
CC: Len Brown <lenb@kernel.org>
CC: davej@codemonkey.org.uk
CC: linux@dominikbrodowski.net
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
The "unsigned int processor" everywhere confused Rusty, leading to
breakage when he passed in smp_processor_id().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Set the transition latency to value smaller than CPUFREQ_ETERNAL so
governors other than "performance" work (like the "ondemand" one).
The value is found in "AMD PowerNow! Technology Platform Design Guide for
Embedded Processors" dated December 2000 (AMD doc #24267A). There is the
answer to one of FAQs on page 40 which states that suggested complete transition
period is 200 us.
Tested on K6-2+ CPU with K6-3 core (model 13, stepping 4).
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
It's still mugging the current process's cpumask, but as comment in
1ff6e97f1d says, it's not a trivial fix.
So, at least we can use a cpumask_var_t to do the Wrong Thing the Right Way :)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To: cpufreq@vger.kernel.org
Cc: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
In commit 0de51088e6, we introduced the
use of acpi-cpufreq on VIA/Centaur CPU's by removing a vendor check for
VENDOR_INTEL. However, as it turns out, at least the Nano CPU's also
need the PDC (processor driver capabilities) handshake in order to
activate the methods required for acpi-cpufreq.
Since arch_acpi_processor_init_pdc() contains another vendor check for
Intel, the PDC is not initialized on VIA CPU's. The resulting behavior
of a current mainline kernel on such systems is: acpi-cpufreq
loads and it indicates CPU frequency changes. However, the CPU stays at
a single frequency
This trivial patch ensures that init_intel_pdc() is called on Intel and
VIA/Centaur CPU's alike.
Signed-off-by: Harald Welte <HaraldWelte@viatech.com>
Signed-off-by: Dave Jones <davej@redhat.com>
The validate_event() was failing on valid event combinations. The
function was assuming that if x86_schedule_event() returned 0, it
meant error. But x86_schedule_event() returns the counter index and
0 is a perfectly valid value. An error is returned if the function
returns a negative value.
Furthermore, validate_event() was also failing for event groups
because the event->pmu was not set until after
hw_perf_event_init().
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: paulus@samba.org
Cc: perfmon2-devel@lists.sourceforge.net
Cc: eranian@gmail.com
LKML-Reference: <4b0bdf36.1818d00a.07cc.25ae@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
--
arch/x86/kernel/cpu/perf_event.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Move the find_smp_config() call to before bootmem is initialized.
Use reserve_early() instead of reserve_bootmem() in it.
This simplifies the code, we only need to call find_smp_config()
once and can remove the now unneeded reserve parameter from
x86_init_mpparse::find_smp_config.
We thus also reduce x86's dependency on bootmem allocations.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B0BB9F2.70907@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- Change is_untracked_pat_range() to return bool.
- Clean up the initialization of is_untracked_pat_range() -- by default,
we simply point it at is_ISA_range() directly.
- Move is_untracked_pat_range to the end of struct x86_platform, since
it is the newest field.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>
Change is_ISA_range() from a macro to an inline function. This makes
it type safe, and also allows it to be assigned to a function pointer
if necessary.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <20091119202341.GA4420@sgi.com>
is_untracked_pat_range() -- like its components, is_ISA_range() and
is_GRU_range(), takes a normal semiclosed interval (>=, <) whereas the
PAT code called it as if it took a closed range (>=, <=). Fix.
Although this is a bug, I believe it is non-manifest, simply because
none of the callers will call this with non-page-aligned addresses.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>
Checkin fd12a0d69a made the PAT
untracked range a platform configurable, but missed on occurrence of
is_ISA_range() which still refers to PAT-untracked memory, and
therefore should be using the configurable.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>
display_cacheinfo() doesn't display anything anymore and it is used to
detect CPU cache sizes. Rename it accordingly.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
LKML-Reference: <20091121130145.GA31357@liondog.tnic>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
GRU space is always mapped as WB in the page table. There is
no need to track the mappings in the PAT. This also eliminates
the "freeing invalid memtype" messages when the GRU space is
unmapped.
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>
[ v2: fix build failure ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
irq_thermal_count is only being maintained when
X86_THERMAL_VECTOR, and both X86_THERMAL_VECTOR and
X86_MCE_THRESHOLD don't need extra wrapping in X86_MCE
conditionals.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Yong Wang <yong.y.wang@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Arjan van de Ven <arjan@infradead.org>
LKML-Reference: <4B06AFA902000078000211F8@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
A memory mapped register that affects the SGI UV Broadcast
Assist Unit's interrupt handling may sometimes be unintialized.
Remove the condition on its initialization, as that condition
can be randomly satisfied by a hardware reset.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org>
LKML-Reference: <E1NBGB9-0005nU-Dp@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Attribute authorship to developers of hw-breakpoint related
files.
Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091123154713.GA5593@in.ibm.com>
[ v2: moved it to latest -tip ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Lbswap_mask, Lpoly and Ltwo_one should clearly belong to
.data section, not .text.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Old binutils do not support PCLMULQDQ-NI and PSHUFB, to make kernel
can be compiled by them, .byte code is used instead of assembly
instructions. But the readability and flexibility of raw .byte code is
not good.
So corresponding assembly instruction like gas macro is used instead.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
For some devices the ACPI table may define unity map
requirements which must me met when the IOMMU is enabled. So
we need to attach devices to their domains as early as
possible so that these mappings are in place when needed.
This patch assigns the domains right after they are
allocated. Otherwise this can result in I/O page faults
before a driver binds to a device and BIOS is still using
it.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Old binutils do not support AES-NI instructions, to make kernel can be
compiled by them, .byte code is used instead of AES-NI assembly
instructions. But the readability and flexibility of raw .byte code is
not good.
So corresponding assembly instruction like gas macro is used instead.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This function may be called on the resume path and can not
be dropped after booting.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
init_task doesn't get its stack end location set to
STACK_END_MAGIC, and hence the message is confusing
rather than helpful in this case.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4B06AEFE02000078000211F4@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Decreases perf overhead when function tracing is enabled,
by about 50%.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
CPU to node mapping is set via the following sequence:
1. numa_init_array(): Set up roundrobin from cpu to online node
2. init_cpu_to_node(): Set that according to apicid_to_node[]
according to srat only handle the node that
is online, and leave other cpu on node
without ram (aka not online) to still
roundrobin.
3. later call srat_detect_node for Intel/AMD, will use first_online
node or nearby node.
Problem is that setup_per_cpu_areas() is not called between 2 and 3,
the per_cpu for cpu on node with ram is on different node, and could
put that on node with two hops away.
So try to optimize this and add find_near_online_node() and call
init_cpu_to_node().
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4B07A739.3030104@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In the NUMA bootmem setup failure path we freed nodedata_phys
incorrectly.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4B07A739.3030104@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make it consistent with APIC MADT print out,
for big systems APIC id in hex is more readable.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B07A739.3030104@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When irq_desc is moved, we need to make sure to use the right cfg_new.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B07A739.3030104@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Suresh made dmar_table_init() already have that protection.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B07A739.3030104@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
use find_e820_area()/reserve_early() instead.
-v2: address Eric's request, to restore original semantics.
will fail, if the provided address can not be used.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <4B09E2F9.7040403@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/x86/kernel/kprobes.c
kernel/trace/Makefile
Merge reason: hw-breakpoints perf integration is looking
good in testing and in reviews, plus conflicts
are mounting up - so merge & resolve.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Check objdump version before using it for insn decoder build test,
because some older objdump can't decode AVX code correctly.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Jim Keniston <jkenisto@us.ibm.com>
LKML-Reference: <20091120171314.6715.30390.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Fix postest_verbose to posttest_verbose, and add posttest_64bit option
for CONFIG_64BIT != y, since old command just passed '-' instead
of '-n' when CONFIG_64BIT is not set.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Jim Keniston <jkenisto@us.ibm.com>
LKML-Reference: <20091120171307.6715.66099.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
When the kernel is compiled with -pg for tracing GCC 4.4.x inserts
stack alignment of a function _before_ the mcount prologue if the
-march=pentium-mmx is set and -mtune=generic is not set. This breaks
the assumption of the function graph tracer which expects that the
mcount prologue
push %ebp
mov %esp, %ebp
is the first stack operation in a function because it needs to modify
the function return address on the stack to trap into the tracer
before returning to the real caller.
The generated code is:
push %edi
lea 0x8(%esp),%edi
and $0xfffffff0,%esp
pushl -0x4(%edi)
push %ebp
mov %esp,%ebp
so the tracer modifies the copy of the return address which is stored
after the stack alignment and therefor does not trap the return which
in turn breaks the call chain logic of the tracer and leads to a
kernel panic.
Aside of the fact that the generated code is horrible for no good
reason other -march -mtune options generate the expected:
push %ebp
mov %esp,%ebp
and $0xfffffff0,%esp
which does the same and keeps everything intact.
After some experimenting we found out that this problem is restricted
to gcc4.4.x and to the following -march settings:
i586, pentium, pentium-mmx, k6, k6-2, k6-3, winchip-c6, winchip2, c3,
geode
By adding -mtune=generic the code generator produces always the
expected code.
So forcing -mtune=generic when CONFIG_FUNCTION_GRAPH_TRACER=y is not
pretty, but at the moment the only way to prevent that the kernel
trips over gcc-shrooms induced code madness.
Most distro kernels have CONFIG_X86_GENERIC=y anyway which forces
-mtune=generic as well so it will not impact those.
References: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109http://lkml.org/lkml/2009/11/19/17
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <alpine.LFD.2.00.0911200206570.24119@localhost.localdomain>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>,
Cc: Jeff Law <law@redhat.com>
Cc: gcc@gcc.gnu.org
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Andrew Haley <aph@redhat.com>
Cc: Richard Guenther <richard.guenther@gmail.com>
Cc: stable@kernel.org
Since some instructions are not decoded correctly by older
versions of objdump, it may cause false positive error in insn
decoder posttest.
This changes build error of insn decoder test to build warning.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20091116230631.5250.41579.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rather than having X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
(with inconsistent defaults), just having the latter suffices as
the former can be easily calculated from it.
To be consistent, also change X86_INTERNODE_CACHE_BYTES to
X86_INTERNODE_CACHE_SHIFT, and set it to 7 (128 bytes) for NUMA
to account for last level cache line size (which here matters
more than L1 cache line size).
Finally, make sure the default value for X86_L1_CACHE_SHIFT,
when X86_GENERIC is selected, is being seen before that for the
individual CPU model options (other than on x86-64, where
GENERIC_CPU is part of the choice construct, X86_GENERIC is a
separate option on ix86).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Ravikiran Thirumalai <kiran@scalex86.org>
Acked-by: Nick Piggin <npiggin@suse.de>
LKML-Reference: <4AFD5710020000780001F8F0@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
clockevents.mult became u32. Fix the printk format.
Pointed-out-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The typename member of struct irq_chip was kept for migration purposes
and is obsolete since more than 2 years. Fix up the leftovers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
"[CPUFREQ] cpumask: avoid playing with cpus_allowed in speedstep-ich.c"
changed the code to mistakenly pass the current cpu as the "processor"
argument of speedstep_get_frequency(), whereas it should be the type of
the processor.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14340
Based on a patch by Dave Mueller.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Dominik Brodowski <linux@brodo.de>
Reported-by: Dave Mueller <dave.mueller@gmx.ch>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Removing the SMT/HT check, since the Errata doesn't mention
Hyper-Threading.
Adding in a printk, so that the user knows why acpi-cpufreq refuses to
load. Also, once system is blacklisted, don't repeat checks to see if
blacklisted. This also causes the message to only be printed once,
rather than for each CPU.
Signed-off-by: John L. Villalovos <john.l.villalovos@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
There is a typo in the longhaul detection code so only Longhaul v1 or Longhaul v3
is selected. The Longhaul v2 is not selected even for CPUs which are capable of.
Tested on PCChips Giga Pro board. Frequency changes work and the Longhaul v2
detects that the board is not capable of changing CPU voltage.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
The current MTRR code treats WP as a form of UC. This really isn't
desirable behaviour, except possibly in the case of severe MTRR
shortage. Disable this, to allow legitimate uses of WP to remain
unmolested.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Since commit 0a544198 "timekeeping: Move NTP adjusted clock multiplier
to struct timekeeper" the clock multiplier of vsyscall is updated with
the unmodified clock multiplier of the clock source and not with the
NTP adjusted multiplier of the timekeeper.
This causes user space observerable time warps:
new CLOCK-warp maximum: 120 nsecs, 00000025c337c537 -> 00000025c337c4bf
Add a new argument "mult" to update_vsyscall() and hand in the
timekeeping internal NTP adjusted multiplier.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tony Luck <tony.luck@intel.com>
LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Resolve the conflict between v2.6.32-rc7 where dn_def_dev_handler
gets a small bug fix and the sysctl tree where I am removing all
sysctl strategy routines.
This kills bad_dma_address variable, the old mechanism to enable
IOMMU drivers to make dma_mapping_error() work in IOMMU's
specific way.
bad_dma_address variable was introduced to enable IOMMU drivers
to make dma_mapping_error() work in IOMMU's specific way.
However, it can't handle systems that use both swiotlb and HW
IOMMU. SO we introduced dma_map_ops->mapping_error to solve that
case.
Intel VT-d, GART, and swiotlb already use
dma_map_ops->mapping_error. Calgary, AMD IOMMU, and nommu use
zero for an error dma address. This adds DMA_ERROR_CODE and
converts them to use it (as SPARC and POWER does).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: muli@il.ibm.com
Cc: joerg.roedel@amd.com
LKML-Reference: <1258287594-8777-3-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
GART IOMMU is the only user of bad_dma_address variable.
This patch converts GART to use the newer mechanism, fill in
->mapping_error() in struct dma_map_ops, to make
dma_mapping_error() work in IOMMU specific way.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: muli@il.ibm.com
Cc: joerg.roedel@amd.com
LKML-Reference: <1258287594-8777-2-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Show symbol name if insn decoder test find a difference.
This will help us to find out where the issue is.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20091116230624.5250.49813.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add verbose option to insn decoder test. This dumps decoded
instruction when building kernel with V=1.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20091116230618.5250.18762.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It is possible for x86_64 systems to lack the NX bit either due to the
hardware lacking support or the BIOS having turned off the CPU capability,
so NX status should be reported. Additionally, anyone booting NX-capable
CPUs in 32bit mode without PAE will lack NX functionality, so this change
provides feedback for that case as well.
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1258154897-6770-6-git-send-email-hpa@zytor.com>
The 32- and 64-bit code used very different mechanisms for enabling
NX, but even the 32-bit code was enabling NX in head_32.S if it is
available. Furthermore, we had a bewildering collection of tests for
the available of NX.
This patch:
a) merges the 32-bit set_nx() and the 64-bit check_efer() function
into a single x86_configure_nx() function. EFER control is left
to the head code.
b) eliminates the nx_enabled variable entirely. Things that need to
test for NX enablement can verify __supported_pte_mask directly,
and cpu_has_nx gives the supported status of NX.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Chris Wright <chrisw@sous-sol.org>
LKML-Reference: <1258154897-6770-5-git-send-email-hpa@zytor.com>
Acked-by: Kees Cook <kees.cook@canonical.com>
Make set_memory_x/set_memory_nx directly aware of if NX is supported
in the system or not, rather than requiring that every caller assesses
that support independently.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tim Starling <tstarling@wikimedia.org>
Cc: Hannes Eder <hannes@hanneseder.net>
LKML-Reference: <1258154897-6770-4-git-send-email-hpa@zytor.com>
Acked-by: Kees Cook <kees.cook@canonical.com>
Always save the value of EFER, regardless of the state of NX. Since
EFER may not actually exist, use rdmsr_safe() to do so.
v2: check the return value from rdmsr_safe() instead of relying on
the output values being unchanged on error.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@tuxonice.net>
LKML-Reference: <1258154897-6770-3-git-send-email-hpa@zytor.com>
Acked-by: Kees Cook <kees.cook@canonical.com>
Use symbolic constants rather than hard-coded values when setting
EFER.NX in head_32.S, and do a more rigorous test for the validity of
the response when probing for the extended CPUID range.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1258154897-6770-2-git-send-email-hpa@zytor.com>
Acked-by: Kees Cook <kees.cook@canonical.com>
If IO-APIC base address is 1K aligned we should not fail
on resourse insertion procedure. For this sake we define
IO_APIC_SLOT_SIZE constant which should cover all IO-APIC
direct accessible registers.
An example of a such configuration is there
http://marc.info/?l=linux-kernel&m=118114792006520
|
| Quoting the message
|
| IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
| IOAPIC[1]: apic_id 3, version 32, address 0xfec80000, GSI 24-47
| IOAPIC[2]: apic_id 4, version 32, address 0xfec80400, GSI 48-71
| IOAPIC[3]: apic_id 5, version 32, address 0xfec84000, GSI 72-95
| IOAPIC[4]: apic_id 8, version 32, address 0xfec84400, GSI 96-119
|
Reported-by: "Maciej W. Rozycki" <macro@linux-mips.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20091116151426.GC5653@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On x86-64, copy_[to|from]_user() rely on assembly routines that
never call might_fault(), making us missing various lockdep
checks.
This doesn't apply to __copy_from,to_user() that explicitly
handle these calls, neither is it a problem in x86-32 where
copy_to,from_user() rely on the "__" prefixed versions that
also call might_fault().
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1258382538-30979-1-git-send-email-fweisbec@gmail.com>
[ v2: fix module export ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix panic seen on some IBM and HP systems on 2.6.32-rc6:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c
[...]
[<ffffffff8120bbde>] cpumask_next_and+0x2e/0x3b
[<ffffffff81225c62>] pci_device_probe+0x8e/0xf5
[<ffffffff812b9be6>] ? driver_sysfs_add+0x47/0x6c
[<ffffffff812b9da5>] driver_probe_device+0xd9/0x1f9
[<ffffffff812b9f1d>] __driver_attach+0x58/0x7c
[<ffffffff812b9ec5>] ? __driver_attach+0x0/0x7c
[<ffffffff812b9298>] bus_for_each_dev+0x54/0x89
[<ffffffff812b9b4f>] driver_attach+0x19/0x1b
[<ffffffff812b97ae>] bus_add_driver+0xd3/0x23d
[<ffffffff812ba1e7>] driver_register+0x98/0x109
[<ffffffff81225ed0>] __pci_register_driver+0x63/0xd3
[<ffffffff81072776>] ? up_read+0x26/0x2a
[<ffffffffa0081000>] ? k8temp_init+0x0/0x20 [k8temp]
[<ffffffffa008101e>] k8temp_init+0x1e/0x20 [k8temp]
[<ffffffff8100a073>] do_one_initcall+0x6d/0x185
[<ffffffff8108d765>] sys_init_module+0xd3/0x236
[<ffffffff81011ac2>] system_call_fastpath+0x16/0x1b
I put in a printk and commented out the set_dev_node()
call when and got this output:
quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x0
quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x1
quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x2
quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x3
I.e. the issue appears to be that the HW has set val to a valid
value, however, the system is only configured for a single
node -- 0, the others are offline.
Check to see if the node is actually online before setting
the numa node for an AMD northbridge in quirk_amd_nb_node().
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: bhavna.sarathy@amd.com
Cc: jbarnes@virtuousgeek.org
Cc: andreas.herrmann3@amd.com
LKML-Reference: <20091112180933.12532.98685.sendpatchset@prarit.bos.redhat.com>
[ v2: clean up the code and add comments ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add debugobject support to track the life time of work_structs.
While at it, remove duplicate definition of
INIT_DELAYED_WORK_ON_STACK().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
This v2.6.26 commit:
ad2fc2c: x86: fix copy_user on x86
rendered __copy_from_user_inatomic() identical to
copy_user_generic(), yet didn't make the former just call the
latter from an inline function.
Furthermore, this v2.6.19 commit:
b885808: [PATCH] Add proper sparse __user casts to __copy_to_user_inatomic
converted the return type of __copy_to_user_inatomic() from
unsigned long to int, but didn't do the same to
__copy_from_user_inatomic().
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: <v.mayatskih@gmail.com>
LKML-Reference: <4AFD5778020000780001F8F4@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This makes calgary_iommu_init() static and moves it to remove
the forward declaration.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: muli@il.ibm.com
LKML-Reference: <20091114212603U.fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
iommu_init_noop() is in arch/x86/kernel/x86_init.c but
iommu_shutdown_noop() in arch/x86/include/asm/iommu.h.
This moves iommu_shutdown_noop() to x86_init.c for consistency.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <1258199198-16657-3-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We set dma_ops to nommu_dma_ops at two different places for
x86_32 and x86_64. This unifies them by setting dma_ops to
nommu_dma_ops by default.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <1258199198-16657-2-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This build error:
arch/x86/kvm/x86.c:3655: error: implicit declaration of function 'hw_breakpoint_restore'
Happens because in the CONFIG_KVM=m case there's no 'CONFIG_KVM' define
in the kernel - it's CONFIG_KVM_MODULE in that case.
Make the prototype available unconditionally.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
LKML-Reference: <1258114575-32655-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yinghai Lu noticed that this commit:
0388423: x86: Minimise printk spew from per-vendor init code
mistakenly left out the initialization of cpu_devs[] in the
!PROCESSOR_SELECT case. Fix it.
Reported-by: Yinghai Lu <yinghai@kernel.org>
Cc: Dave Jones <davej@redhat.com>
LKML-Reference: <20091113203000.GA19160@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
As Dave Jones said about the output in intel_cacheinfo.c: "They
aren't useful, and pollute the dmesg output a lot (especially on
machines with many cores). Also the same information can be
trivially found out from userspace."
Give the generic display_cacheinfo() function the same treatment.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Dave Jones <davej@redhat.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <adaocn6dp99.fsf_-_@roland-alpha.cisco.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In the default case where the kernel supports all CPU vendors,
we currently print out a bunch of not useful messages on every
system.
32-bit:
KERNEL supported cpus:
Intel GenuineIntel
AMD AuthenticAMD
NSC Geode by NSC
Cyrix CyrixInstead
Centaur CentaurHauls
Transmeta GenuineTMx86
Transmeta TransmetaCPU
UMC UMC UMC UMC
64-bit:
KERNEL supported cpus:
Intel GenuineIntel
AMD AuthenticAMD
Centaur CentaurHauls
Given that "what CPUs does the kernel support" isn't useful for
the "support everything" case, we can suppress these printk's.
Signed-off-by: Dave Jones <davej@redhat.com>
LKML-Reference: <20091113203000.GA19160@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add a field to store the boot cursor state and implement this for VGA on
x86. This can then be used to set the default policy for the boot console.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
LKML-Reference: <1258142222-16092-1-git-send-email-mjg@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
They aren't really useful, and they pollute the dmesg output a lot
(especially on machines with many cores).
Also the same information can be trivially found out from
userspace.
Reported-by: Mike Travis <travis@sgi.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091112231542.GA7129@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that sys_sysctl is a generic wrapper around /proc/sys .ctl_name
and .strategy members of sysctl tables are dead code. Remove them.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
The intel_init_thermal() is called from resume path, so it
cannot be marked as __init.
OTOH mce_banks_init() is only called from
__mcheck_cpu_cap_init() which is marked as __cpuinit, so it can
be also marked as __cpuinit.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Yong Wang <yong.y.wang@linux.intel.com>
LKML-Reference: <4AFBB0B8.2070501@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
x86/PCI: Adjust GFP mask handling for coherent allocations
PCI ASPM: fix oops on root port removal
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, amd-ucode: Check UCODE_MAGIC before loading the container file
x86: Fix error return sequence in __ioremap_caller()
x86: Add Phoenix/MSC BIOSes to lowmem corruption list
Instead of using bootmem, try find_e820_area()/reserve_early(),
and call acpi_reserve_memory() early, to allocate the wakeup
trampoline code area below 1M.
This is more reliable, and it also removes a dependency on
bootmem.
-v2: change function name to acpi_reserve_wakeup_memory(),
as suggested by Rafael.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: pm list <linux-pm@lists.linux-foundation.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4AFA210B.3020207@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When switching a CPU offline/online and then doing
suspend/resume, ucode is not updated on this CPU.
This is due to the microcode_fini_cpu() call which frees uci->mc
when setting the CPU offline:
static void microcode_fini_cpu_amd(int cpu)
{
struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
vfree(uci->mc);
uci->mc = NULL;
}
When the CPU is set online uci->mc is still NULL because no
ucode update is required.
Finally this prevents ucode update when resuming after suspend:
static enum ucode_state microcode_resume_cpu(int cpu)
{
struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
if (!uci->mc)
return UCODE_NFOUND;
...
}
Fix is to check whether uci->mc is valid before
microcode_resume_cpu() is called.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: dimm <dmitry.adamushko@gmail.com>
LKML-Reference: <20091111190329.GF18592@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
POWERPC doesn't expect it to be used.
This fixes the linux-next build failure reported by
Stephen Rothwell:
lib/swiotlb.c: In function 'setup_io_tlb_npages':
lib/swiotlb.c:114: error: 'swiotlb' undeclared (first use in this function)
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: peterz@infradead.org
LKML-Reference: <20091112000258F.fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mark the thermal init functions __init so that the init memory
can be freed.
Signed-off-by: Yong Wang <yong.y.wang@intel.com>
LKML-Reference: <20091111075125.GA17900@ywang-moblin2.bj.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
COMPAT_VDSO has 2 help text blocks, but kconfig only uses the
last one found, so merge the 2 blocks.
It would be real nice if kconfig would warn about this.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <4AF9FB6C.70003@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I double-checked the datasheet. One of the existing
descriptors has a typo: it should be 2MB not 2038 KB.
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: <stable@kernel.org> # .3x.x: 85160b9: x86: Add new Intel CPU cache size descriptors
Cc: <stable@kernel.org> # .3x.x
LKML-Reference: <20091110200120.GA27090@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The latest rev of Intel doc AP-485 details new cache descriptors
that we don't yet support. 12MB, 18MB and 24MB 24-way assoc L3
caches.
Signed-off-by: Dave Jones <davej@redhat.com>
LKML-Reference: <20091110184924.GA20337@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Most of the time x86_init.h is included in pci-dma.c - but not always,
leading to this rare build failure:
arch/x86/kernel/pci-dma.c:296: error: 'x86_init' undeclared (first use in this function)
So include asm/x86_init.h explicitly.
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-2-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If HW IOMMU initialization fails (Intel VT-d often does this,
typically due to BIOS bugs), we fall back to nommu. It doesn't
work for the majority since nowadays we have more than 4GB
memory so we must use swiotlb instead of nommu.
The problem is that it's too late to initialize swiotlb when HW
IOMMU initialization fails. We need to allocate swiotlb memory
earlier from bootmem allocator. Chris explained the issue in
detail:
http://marc.info/?l=linux-kernel&m=125657444317079&w=2
The current x86 IOMMU initialization sequence is too complicated
and handling the above issue makes it more hacky.
This patch changes x86 IOMMU initialization sequence to handle
the above issue cleanly.
The new x86 IOMMU initialization sequence are:
1. we initialize the swiotlb (and setting swiotlb to 1) in the case
of (max_pfn > MAX_DMA32_PFN && !no_iommu). dma_ops is set to
swiotlb_dma_ops or nommu_dma_ops. if swiotlb usage is forced by
the boot option, we finish here.
2. we call the detection functions of all the IOMMUs
3. the detection function sets x86_init.iommu.iommu_init to the
IOMMU initialization function (so we can avoid calling the
initialization functions of all the IOMMUs needlessly).
4. if the IOMMU initialization function doesn't need to swiotlb
then sets swiotlb to zero (e.g. the initialization is
sucessful).
5. if we find that swiotlb is set to zero, we free swiotlb
resource.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-10-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This enables us to avoid printing swiotlb memory info when we
initialize swiotlb. After swiotlb initialization, we could find
that we don't need swiotlb.
This patch removes the code to print swiotlb memory info in
swiotlb_init() and exports the function to do that.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
Cc: tony.luck@intel.com
Cc: benh@kernel.crashing.org
LKML-Reference: <1257849980-22640-9-git-send-email-fujita.tomonori@lab.ntt.co.jp>
[ -v2: merge up conflict ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This changes detect_intel_iommu() to set intel_iommu_init() to
iommu_init hook if detect_intel_iommu() finds the IOMMU.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-6-git-send-email-fujita.tomonori@lab.ntt.co.jp>
[ -v2: build fix for the !CONFIG_DMAR case ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This changes amd_iommu_detect() to set amd_iommu_init to
iommu_init hook if amd_iommu_detect() finds the AMD IOMMU.
We can kill the code to check if we found the IOMMU in
amd_iommu_init() since amd_iommu_detect() sets amd_iommu_init()
only when it found the IOMMU.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-5-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This changes gart_iommu_hole_init() to set gart_iommu_init() to
iommu_init hook if gart_iommu_hole_init() finds the GART IOMMU.
We can kill the code to check if we found the IOMMU in
gart_iommu_init() since gart_iommu_hole_init() sets
gart_iommu_init() only when it found the IOMMU.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-4-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This changes detect_calgary() to set init_calgary() to
iommu_init hook if detect_calgary() finds the Calgary IOMMU.
We can kill the code to check if we found the IOMMU in
init_calgary() since detect_calgary() sets init_calgary() only
when it found the IOMMU.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
LKML-Reference: <1257849980-22640-3-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We call the detections functions of all the IOMMUs then all
their initialization functions. The latter is pointless since we
don't detect multiple different IOMMUs. What we need to do is
calling the initialization function of the detected IOMMU.
This adds iommu_init hook to x86_init_ops so if an IOMMU
detection function can set its initialization function to the
hook.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-2-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is no point in warning when there is no ucode available
for a specific CPU revision. Currently the container-file, which
provides the AMD ucode patches for OS load, contains only a few
ucode patches.
It's already clearly indicated by the printed patch_level
whenever new ucode was available and an update happened. So the
warning message is of no help but rather annoying on systems
with many CPUs.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: dimm <dmitry.adamushko@gmail.com>
LKML-Reference: <20091110110825.GI30802@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This also implies that corresponding log messages, e.g.
platform microcode: firmware: requesting amd-ucode/microcode_amd.bin
show up only once on module load and not when ucode is updated
for each CPU.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: dimm <dmitry.adamushko@gmail.com>
LKML-Reference: <20091110110723.GH30802@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Wrap in the cpu dr7 check that tells if we have active
breakpoints that need to be restored in the cpu.
This wrapper makes the check more self-explainable and also
reusable for any further other uses.
Reported-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
Fix the broken a.out format dump. For now we only dump the ptrace
breakpoints.
TODO: Dump every perf breakpoints for the current thread, not only
ptrace based ones.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
Commit:
b6ff32d: x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype()
consolidated reserve_memtype() and pat_x_mtrr_type,
this made ioremap_default() same as ioremap_cache().
Remove the redundant function and change the only caller to use
ioremap_cache.
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
LKML-Reference: <1257845005-7938-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Commit:
b6ff32d: x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype()
consolidated code in pat_x_mtrr_type() and reserve_memtype(),
which removed the special case (req_type is -1) for the
PAT-enabled part.
We should also change comments and the PAT-disabled part.
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
LKML-Reference: <1257844987-7906-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On platforms where the BIOS handles the thermal monitor interrupt,
APIC_LVTTHMR on each logical CPU is programmed to generate a SMI
and OS must not touch it.
Unfortunately AP bringup sequence using INIT-SIPI-SIPI clears all
the LVT entries except the mask bit. Essentially this results in
all LVT entries including the thermal monitoring interrupt set
to masked (clearing the bios programmed value for APIC_LVTTHMR).
And this leads to kernel take over the thermal monitoring
interrupt on AP's but not on BSP (leaving the bios programmed
value only on BSP).
As a result of this, we have seen system hangs when the thermal
monitoring interrupt is generated.
Fix this by reading the initial value of thermal LVT entry on
BSP and if bios has taken over the control, then program the
same value on all AP's and leave the thermal monitoring
interrupt control on all the logical cpu's to the bios.
Signed-off-by: Yong Wang <yong.y.wang@intel.com>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Arjan van de Ven <arjan@infradead.org>
LKML-Reference: <20091110013824.GA24940@ywang-moblin2.bj.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org
We should not use physid_mask_t as a stack based
variable in apic code. This type depends on MAX_APICS
parameter which may be huge enough.
Especially it became a problem with apic NOOP driver which
is portable between 32 bit and 64 bit environment
(where we have really huge MAX_APICS).
So apic driver should operate with pointers and a caller
in turn should aware of allocation physid_mask_t variable.
As a side (but positive) effect -- we may use already
implemented physid_set_mask_of_physid function eliminating
default_apicid_to_cpu_present completely.
Note that physids_coerce and physids_promote turned into static
inline from macro (since macro hides the fact that parameter is
being interpreted as unsigned long, make it explicit).
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20091109220659.GA5568@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It will take some time for binutils (gas) to support some newly added
instructions, such as SSE4.1 instructions or the AES-NI instructions
found in upcoming Intel CPU.
To make the source code can be compiled by old binutils, .byte code is
used instead of the assembly instruction. But the readability and
flexibility of raw .byte code is not good.
This patch solves the issue of raw .byte code via generating it via
assembly instruction like gas macro. The syntax is as close as
possible to real assembly instruction.
Some helper macros such as MODRM is not a full feature
implementation. It can be extended when necessary.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In fact it's never get used on x86-64 (for 64 bit platform
we use differ technique to enumerate io-units).
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20091108131645.GD5300@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We should be ready that one day MAX_IO_APICS may raise its
number. To prevent memory overwrite we're to use safe
snprintf while set IO-APIC resourse name.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20091108155431.GC25940@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The whole page is reserved for IO-APIC fixmap
due to non-cacheable requirement. So lets note
this explicitly instead of playing with numbers.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
LKML-Reference: <20091108155356.GB25940@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rather than forcing GFP flags and DMA mask to be inconsistent,
GFP flags should be determined even for the fallback device
through dma_alloc_coherent_mask()/dma_alloc_coherent_gfp_flags().
This restores 64-bit behavior as it was prior to commits
8965eb1938 and
4a367f3a9d (not sure why there are
two of them), where GFP_DMA was forced on for 32-bit, but not
for 64-bit, with the slight adjustment that afaict even 32-bit
doesn't need this without CONFIG_ISA.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
LKML-Reference: <4AF18187020000780001D8AA@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch rebase the implementation of the breakpoints API on top of
perf events instances.
Each breakpoints are now perf events that handle the
register scheduling, thread/cpu attachment, etc..
The new layering is now made as follows:
ptrace kgdb ftrace perf syscall
\ | / /
\ | / /
/
Core breakpoint API /
/
| /
| /
Breakpoints perf events
|
|
Breakpoints PMU ---- Debug Register constraints handling
(Part of core breakpoint API)
|
|
Hardware debug registers
Reasons of this rewrite:
- Use the centralized/optimized pmu registers scheduling,
implying an easier arch integration
- More powerful register handling: perf attributes (pinned/flexible
events, exclusive/non-exclusive, tunable period, etc...)
Impact:
- New perf ABI: the hardware breakpoints counters
- Ptrace breakpoints setting remains tricky and still needs some per
thread breakpoints references.
Todo (in the order):
- Support breakpoints perf counter events for perf tools (ie: implement
perf_bpcounter_event())
- Support from perf tools
Changes in v2:
- Follow the perf "event " rename
- The ptrace regression have been fixed (ptrace breakpoint perf events
weren't released when a task ended)
- Drop the struct hw_breakpoint and store generic fields in
perf_event_attr.
- Separate core and arch specific headers, drop
asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
- Use new generic len/type for breakpoint
- Handle off case: when breakpoints api is not supported by an arch
Changes in v3:
- Fix broken CONFIG_KVM, we need to propagate the breakpoint api
changes to kvm when we exit the guest and restore the bp registers
to the host.
Changes in v4:
- Drop the hw_breakpoint_restore() stub as it is only used by KVM
- EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
module
- Restore the breakpoints unconditionally on kvm guest exit:
TIF_DEBUG_THREAD doesn't anymore cover every cases of running
breakpoints and vcpu->arch.switch_db_regs might not always be
set when the guest used debug registers.
(Waiting for a reliable optimization)
Changes in v5:
- Split-up the asm-generic/hw-breakpoint.h moving to
linux/hw_breakpoint.h into a separate patch
- Optimize the breakpoints restoring while switching from kvm guest
to host. We only want to restore the state if we have active
breakpoints to the host, otherwise we don't care about messed-up
address registers.
- Add asm/hw_breakpoint.h to Kbuild
- Fix bad breakpoint type in trace_selftest.c
Changes in v6:
- Fix wrong header inclusion in trace.h (triggered a build
error with CONFIG_FTRACE_SELFTEST
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jan Kiszka <jan.kiszka@web.de>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
This patch cleans up pci_iommu_shutdown() a bit to use
x86_platform (similar to how IA64 initializes an IOMMU driver).
This adds iommu_shutdown() to x86_platform to avoid calling
every IOMMUs' shutdown functions in pci_iommu_shutdown() in
order. The IOMMU shutdown functions are platform specific (we
don't have multiple different IOMMU hardware) so the current way
is pointless.
An IOMMU driver sets x86_platform.iommu_shutdown to the shutdown
function if necessary.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: joerg.roedel@amd.com
LKML-Reference: <20091027163358F.fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
k8.h uses struct bootnode but does not #include a header file
for it, so provide a simple declaration for it.
arch/x86/include/asm/k8.h:13: warning: 'struct bootnode'
declared inside parameter list arch/x86/include/asm/k8.h:13:
warning: its scope is only this definition or declaration, which is probably not what you want
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20091028160955.d27ccb16.randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
kernel missed to free memtype if get_vm_area_caller failed in
__ioremap_caller.
This patch introduces error path to fix this and cleans up the
repetitive error return sequences that contributed to the
creation of the bug.
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1257389031-20429-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This makes the declarations match the definitions, which already
use 'struct cpumask'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <200911052245.41803.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove unused thread_return label from switch_to() macro on
x86-64. Since this symbol cuts into schedule(), backtrace at the
latter half of schedule() was always shown as thread_return().
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
LKML-Reference: <20091105160359.5181.26225.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We have a board with a Phoenix/MSC BIOS which also corrupts the low
64KB of RAM, so add an entry to the table.
Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
LKML-Reference: <20091106154404.002648d9@marrow.netinsight.se>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The roundup() caused a build error (undefined reference to `__udivdi3').
We're aligning to power-of-two boundaries, so it's simpler to just use
ALIGN() anyway, which avoids the division.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Now that we have a generic 32bit compatibility implementation
there is no need for x86 to implement it's own.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
We plan to make the breakpoints parameters generic among architectures.
For that it's better to move the asm-generic header to a generic linux
header.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
* 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: get_tss_base_addr() should return a gpa_t
KVM: x86: Catch potential overrun in MCE setup
The current implementation of get_user_desc() sign extends the return
value because of integer promotion rules. For the most part, this
doesn't matter, because the top bit of base2 is usually 0. If, however,
that bit is 1, then the entire value will be 0xffff... which is
probably not what the caller intended.
This patch casts the entire thing to unsigned before returning, which
generates almost the same assembly as the current code but replaces the
final "cltq" (sign extend) with a "mov %eax %eax" (zero-extend). This
fixes booting certain guests under KVM.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
xen: mask extended topology info in cpuid
xen/hvc: make sure console output is always emitted, with explicit polling
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix kthread_bind() by moving the body of kthread_bind() to sched.c
sched: Disable SD_PREFER_LOCAL at node level
sched: Fix boot crash by zalloc()ing most of the cpu masks
sched: Strengthen buddies and mitigate buddy induced latencies
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, fs: Fix x86 procfs stack information for threads on 64-bit
x86: Add reboot quirk for 3 series Mac mini
x86: Fix printk message typo in mtrr cleanup code
dma-debug: Fix compile warning with PAE enabled
x86/amd-iommu: Un__init function required on shutdown
x86/amd-iommu: Workaround for erratum 63
PCI device BARs are guaranteed to start and end on at least a four-byte
(I/O) or a sixteen-byte (MMIO) boundary because they're aligned on their
size and the low BAR bits are reserved. PCI-to-PCI bridge apertures
have even larger alignment restrictions.
However, some BIOSes (e.g., HP DL360 BIOS P31) report host bridge windows
like "[io 0x0000-0x2cfe]". This is wrong because it excludes the last
port at 0x2cff: it's impossible for a downstream device to claim 0x2cfe
without also claiming 0x2cff. In fact, this BIOS configures a device
behind the bridge to "[io 0x2c00-0x2cff]", so we know the window actually
does include 0x2cff.
This patch rounds the start and end of apertures to the appropriate
boundary. I experimentally determined that Windows contains a similar
workaround; details here:
http://bugzilla.kernel.org/show_bug.cgi?id=14337
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We have occasional problems with PCI resource allocation, and sometimes
they could be avoided by paying attention to what ACPI tells us about
the host bridges. This patch doesn't change the behavior, but it prints
window information that should make debugging easier.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This makes PCI resource management messages more consistent and adds a few
new messages to aid debugging.
Whenever we assign resources to a device, update a BAR, or change a
bridge aperture, it's worth noting it.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use the dev_printk-like "%04x:%02x" format for printing PCI bus numbers.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Jesse accidentally applied v1 [1] of the patchset instead of v2 [2]. This
is the diff between v1 and v2.
The changes in this patch are:
- tidied vsprintf stack buffer to shrink and compute size more
accurately
- use %pR for decoding and %pr for "raw" (with type and flags) instead
of adding %pRt and %pRf
[1] http://lkml.org/lkml/2009/10/6/491
[2] http://lkml.org/lkml/2009/10/13/441
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We use dev_dbg() in arch/x86/pci, but there's no easy way to turn it
on. Add -DDEBUG when CONFIG_PCI_DEBUG=y, just like we do in drivers/pci.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Move xen_domain and related tests out of asm-x86 to xen/xen.h so they
can be included whenever they are necessary.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The current whitelist requires a kernel change for every machine that has
MMCONFIG regions above 4GB, even if BIOS provides a correct MCFG table.
This patch expands the whitelist to include machines with a rev 1 or newer
MCFG table and a DMI_BIOS_DATE of 2010 or later. That way, we only need
kernel changes for new machines that provide incorrect MCFG tables.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
CC: Matthew Wilcox <willy@linux.intel.com>
CC: John Keller <jpk@sgi.com>
CC: Yinghai Lu <yhlu.kernel@gmail.com>
CC: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
CC: Andi Kleen <andi@firstfloor.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Thomas Schlichter reported:
> X.org uses libpciaccess which tries to mmap with write combining enabled via
> /sys/bus/pci/devices/*/resource0_wc. Currently, when PAT is not enabled, the
> kernel does fall back to uncached mmap. Then libpciaccess thinks it succeeded
> mapping with write combining enabled and does not set up suited MTRR entries.
> ;-(
Instead of silently mapping pci mmap region as UC minus in the case
of !pat_enabled and wc request, we can return error. Eric Anholt mentioned
that caller (like X) typically follows up with UC minus pci mmap request and
if there is a free mtrr slot, caller will manage adding WC mtrr.
Jesse Barnes says:
> Older versions of libpciaccess will behave better if we do it that way
> (iirc it only allocates an MTRR if the resource_wc file doesn't exist or
> fails to get mapped).
Reported-by: Thomas Schlichter <thomas.schlichter@web.de>
Signed-off-by: Thomas Schlichter <thomas.schlichter@web.de>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Instead of the PCI code needing to have code to determine the
cacheline size of each processor, use the data the cpu identification
code should have already determined during early boot.
(The vendor checks are also incomplete, and don't take into account
modern CPUs)
I've been carrying a variant of this code in Fedora for a while,
that prints debug information. There are a number of cases where we
are currently setting the PCI cacheline size to 32 bytes, when the CPU
cacheline size is 64 bytes. With this patch, we set them both the same.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Till now, CLS has been determined either by arch code or as
L1_CACHE_BYTES. Only x86 and ia64 set CLS explicitly and x86 doesn't
always get it right. On most configurations, the chance is that
firmware configures the correct value during boot.
This patch makes pci_init() determine CLS by looking at what firmware
has configured. It scans all devices and if all non-zero values
agree, the value is used. If none is configured or there is a
disagreement, pci_dfl_cache_line_size is used. arch can set the dfl
value (via PCI_CACHE_LINE_BYTES or pci_dfl_cache_line_size) or
override the actual one.
ia64, x86 and sparc64 updated to set the default cls instead of the
actual one.
While at it, declare pci_cache_line_size and pci_dfl_cache_line_size
in pci.h and drop private declarations from arch code.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Miller <davem@davemloft.net>
Acked-by: Greg KH <gregkh@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
For intel systems with multi IOH, we should read peer root resources
directly from PCI config space, and don't trust _CRS.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
If TSS we are switching to resides in high memory task switch will fail
since address will be truncated. Windows2k3 does this sometimes when
running with more then 4G
Cc: stable@kernel.org
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
We only allocate memory for 32 MCE banks (KVM_MAX_MCE_BANKS) but we
allow user space to fill up to 255 on setup (mcg_cap & 0xff), corrupting
kernel memory. Catch these overflows.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch fixes two issues in the procfs stack information on
x86-64 linux.
The 32 bit loader compat_do_execve did not store stack
start. (this was figured out by Alexey Dobriyan).
The stack information on a x64_64 kernel always shows 0 kbyte
stack usage, because of a missing implementation of the KSTK_ESP
macro which always returned -1.
The new implementation now returns the right value.
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1257240160.4889.24.camel@wall-e>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Note that there's no freeing the cpu var, since this module has
no unload function.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
LKML-Reference: <200911031458.30987.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo wants the certainty of a static cpumask (rather than a
cpumask_var_t), but cpumask_t will some day be undefined to
avoid on-stack declarations.
This is what DECLARE_BITMAP/to_cpumask() is for.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <200911031453.52394.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
tools/perf/Makefile
Merge reason: Resolve the conflict, merge to upstream and merge in
perf fixes so we can add a dependent patch.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This jump should be unconditional.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1257274925-15713-1-git-send-email-brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
A Xen guest never needs to know about extended topology, and knowing
would just confuse it.
This patch just zeros ebx in leaf 0xb which indicates no topology info,
preventing a crash under Xen on cpus which support this leaf.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>