Move the cleanup of the async queue to the close callback from the flush
callback. This avoids losing asynchronous overflow notifications when
the file descriptor is shared by multiple processes and one terminates.
Signed-off-by: Stephane Eranian <eranian@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
move interrupt, page_fault, non_syscall, dispatch_unaligned_handler and
dispatch_to_fault_handler to avoid lack of instructin space.
The change set 4dcc29e157 bloated
SAVE_MIN_WITH_COVER, SAVE_MIN_WITH_COVER_R19 so that it bloated the
functions which uses those macros.
In the native case, only dispatch_illegal_op_fault had to be moved.
When paravirtualized case the all functions which use the macros need
to be moved to avoid the lack of space.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Introduce pv_time_ops which adds hook to steal time accounting.
On virtualized environment, cpus are shared by many guests and
steal time is the time which is used for other guests.
On virtualized environtment, streal time should be accounted.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
introduce pv_irq_ops which adds hooks to paravirtualize irq related
operations.
On virtualized environment, interruption may be replaced by something
virtualization friendly. So the irq related operation also may need
paravirtualization.
This patch adds necessary hooks to paravirtualize irq related operations.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
add hooks to paravirtualize iosapic which is a real hardware resource.
On virtualized environment it may be replaced something virtualized
friendly.
Define pv_iosapic_ops and add the hooks.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
define pv_init_ops hooks which represents various initialization
hooks for paravirtualized environment. and add hooks.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Make NR_IRQ overridable by each pv instances.
Pv instance may need each own number of irqs so that
NR_IRQS should be the maximum number of nr_irqs each
pv instances need.
Cc: Jes Sorensen <jes@sgi.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
paravirtualize ia64_swtich_to, ia64_leave_syscall and ia64_leave_kernel.
They include sensitive or performance critical privileged instructions
so that they need paravirtualization.
To paravirtualize them by single source and multi compile
they are converted into indirect jump. And define each pv instances.
Cc: Keith Owens <kaos@ocs.com.au>
Cc: "Dong, Eddie" <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
paravirtualize ivt.S which implements fault handler in hand written
assembly code.
They includes sensitive or performance critical privileged instructions.
So they need paravirtualization.
Cc: Keith Owens <kaos@ocs.com.au>
Cc: tgingold@free.fr
Cc: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
paravirtualize minstate.h which are hand written assembly code.
They include sensitive or performance critical privileged
instructions. So that they are appropriate for paravirtualization.
Cc: Keith Owens <kaos@ocs.com.au>
Cc: Akio Takebe <takebe_akio@jp.fujitsu.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Preparation for paravirtualization of hand written assembly code.
They are paravirtualized by single source code and compiled multi times.
To tell those files for target (including native), add one defines.
Cc: "Dong, Eddie" <eddie.dong@intel.com>
Cc: Keith Owens <kaos@ocs.com.au>
Cc: tgingold@free.fr
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
introduce pv_cpu_ops to paravirtualize privleged instructions
which are defined by ia64 intrinsics.
make them indirect C function calls by introducing function
tables, pv_cpu_ops.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch adds a setup hook in the very early boot sequence
before start_kernel() to initialize paravirtualization stuff.
The hook will be set by each pv loader code or by using multi entry point.
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
introduce pv_info which describes some randome info about
underlying execution environment.
Cc: Jes Sorensen <jes@sgi.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Move the LOAD_OFFSET definition from vmlinux.lds.S into system.h.
On paravirtualized environments, it is necessary to detect the
execution environment. One of the solutions is the multi entry point.
The multi entry point allows a boot loader to start the kernel execution
from the entry point which is different from the ELF entry point.
The non standard entry point will defined as the specialized elf note
which contains the LMA of the entry point symbol.
The constant, LOAD_OFFSET, is necessary to calculate the symbol's LMA.
Move the definition into the public header file to make it available
to the multi entry point support.
Cc: "He, Qing" <qing.he@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
remove extern declaration of handle_IPI() in irq_ia64.c.
Instead, declare it in asm-ia64/smp.h.
Later handle_IPI() will be referenced from another file.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Problem: An application violating the architectural rules regarding
operation dependencies and having specific Register Stack Engine (RSE)
state at the time of the violation, may result in an illegal operation
fault and invalid RSE state. Such faults may initiate a cascade of
repeated illegal operation faults within OS interruption handlers.
The specific behavior is OS dependent.
Implication: An application causing an illegal operation fault with
specific RSE state may result in a series of illegal operation faults
and an eventual OS stack overflow condition.
Workaround: OS interruption handlers that switch to kernel backing
store implement a check for invalid RSE state to avoid the series
of illegal operation faults.
The core of the workaround is the RSE_WORKAROUND code sequence
inserted into each invocation of the SAVE_MIN_WITH_COVER and
SAVE_MIN_WITH_COVER_R19 macros. This sequence includes hard-coded
constants that depend on the number of stacked physical registers
being 96. The rest of this patch consists of code to disable this
workaround should this not be the case (with the presumption that
if a future Itanium processor increases the number of registers, it
would also remove the need for this patch).
Move the start of the RBS up to a mod32 boundary to avoid some
corner cases.
The dispatch_illegal_op_fault code outgrew the spot it was
squatting in when built with this patch and CONFIG_VIRT_CPU_ACCOUNTING=y
Move it out to the end of the ivt.
Signed-off-by: Tony Luck <tony.luck@intel.com>
acpi_unregister_gsi() should "undo" what acpi_register_gsi() does.
On systems that have legacy interrupts, acpi_unregister_gsi erroneously calls
iosapci_unregister_intr() which is wrong to do and causes a loud warning.
acpi_unregister_gsi() should just return in these cases.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There is only palinfo_handle_smp as (indirect) user of palinfo_smp_call (by
way of smp_call_function_single) and surely palinfo_handle_smp never pass
NULL as parameter for info.
Signed-off-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix a typo, and coding style cleanups for pfm_handle_work().
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch does:
- make comment at next to resched check more robust
- move "re-check" comments to next to where change predicate regs
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
[Bug-fix for "[BUG?][2.6.25-mm1] sleeping during IRQ disabled"]
This patch does:
- enable interrupts before calling schedule() as same as others, ex. x86
- enable interrupts during ia64_do_signal() and ia64_sync_krbs()
- do_notify_resume_user() is still called with interrupts disabled, since
we can take short path of fsys_mode if-statement quickly.
- pfm_handle_work() is also called with interrupts disabled, since
it can deal interrupt mask within itself.
- fix/add some comments/notes
Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The sequence executed in check_sal_cache_flush:
- pend a timer interrupt
- call SAL_CACHE_FLUSH
- see if interrupt is still pending
can hang HP machines with buggy SAL_CACHE_FLUSH implementations.
Provide a kernel command-line argument to allow users skip this
check if desired. Using this parameter will force ia64_sal_cache_flush
to call ia64_pal_cache_flush() instead of SAL_CACHE_FLUSH.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Some IA64 machines map all cell-local memory above 4 GB (32 bit limit).
However, in most cases, the kernel needs some memory below that limit that is
DMA-capable. So in this machine configuration, the crashkernel will be reserved
above 4 GB.
For machines that use SWIOTLB implementation because they lack an I/O MMU
the low memory is required by the SWIOTLB implementation. In that case,
it doesn't make sense to reserve the crashkernel at all because it's unusable
for kdump.
A special case is the "hpzx1" machine vector. In theory, it has a I/O MMU, so
it can be booted above 4 GB. However, in the kdump case that is not possible
because of changeset 51b58e3e26:
On HP zx1 machines, the 'machvec=dig' parameter is needed for the kdump
kernel to avoid problems with the HP sba iommu. The problem is that during
the boot of the kdump kernel, the iommu is re-initialized, so in-flight DMA
from improperly shutdown drivers causes an IOTLB miss which leads to an
MCA. With kdump, the idea is to get into the kdump kernel with as little
code as we can, so shutting down drivers properly is not an option.
The workaround is to add 'machvec=dig' to the kdump kernel boot parameters.
This makes the kdump kernel avoid using the sba iommu altogether, leaving
the IOTLB intact. Any ongoing DMA falls harmlessly outside the kdump
kernel. After the kdump kernel reboots, all devices will have been
shutdown properly and DMA stopped.
This patch pushes that functionality into the sba iommu initialization
code, so that users won't have to find the obscure documentation telling
them about 'machvec=dig'.
This means that also for hpzx1 it's not possible to boot when all
memory is above the 4 GB limit. So the only machine vectors that can handle
this case are "sn2" and "uv".
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch adds the basic IA64 machvec infrastructure to support
the SGI "UV" platform.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Races galore... General rule: as soon as it's in descriptor table,
it's over; another thread might have started IO on it/dup2() it
elsewhere/dup2() something *over* it/etc. fd_install() is the very
last step one should take - it's a point of no return.
Besides, the damn thing leaked on failure exits...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Replace TIF_RESTORE_SIGMASK with TS_RESTORE_SIGMASK and define
our own set_restore_sigmask() function. This saves the costly
SMP-safe set_bit operation, which we do not need for the sigmask
flag since TIF_SIGPENDING always has to be set too.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix indenting of switch statement to follow CodingStyle, and
pull out handling of call_data into an inlined function.
I confirmed that applying this fix doesn't affect assembled code.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch silences:
WARNING: vmlinux.o(.text+0x44672): Section mismatch in
reference from the function arch_register_cpu() to the
function .cpuinit.text:register_cpu()
Changes are based on codes in arch/x86/kernel/topology.c
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes following warning:
WARNING: vmlinux.o(.exit.text+0xb1): Section mismatch in
reference from the function palinfo_exit() to the variable
.cpuinit.data:palinfo_cpu_notifier
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch shuts up the following:
WARNING: vmlinux.o(.text+0x7102): Section mismatch in
reference from the function fixup_irqs() to the function
.devinit.text:ia64_disable_timer()
Removing ia64_disable_timer() is safe because there are no functions
calling it other than the fixup_irqs(),
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch kills:
WARNING: vmlinux.o(.text+0x1702): Section mismatch in
reference from the function acpi_register_ioapic() to the
function .devinit.text:iosapic_init()
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
TIF_RESTORE_SIGMASK no longer needs to be in the _TIF_WORK_* masks.
Those low bits are scarce. Renumber TIF_RESTORE_SIGMASK to free one up.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Legacy HP ia64 platforms currently cannot provide
/proc/cpuinfo/physical_id due to legacy SAL/PAL implementations.
However, that physical topology information can be obtained
via ACPI.
Provide an interface that gives ACPI one last chance to provide
physical_id for these legacy platforms. This logic only comes
into play iff:
- ACPI actually provides slot information for the CPU
- we lack a valid socket_id
Otherwise, we don't do anything.
Since x86 uses the ACPI processor driver as well, we provide a nop
stub function for arch_fix_phys_package_id() in asm-x86/topology.h
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Commit 113134fcbc changed the flow of
control when calling PAL_LOGICAL_TO_PHYSICAL and SAL_PHYSICAL_ID_INFO.
With the change, if a platform did not implement the latter, a useless
printk would appear in the boot log:
ia64_sal_pltid failed with -1
So let's check the return code and only printk on a true error, and do
not print anything in the unimplemented case. While we're in there,
clean up some stylistic issues too.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Enable the uncached allocator to allocate multiple pages of contiguous
uncached memory.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- remove unused 'irq' argument from pfm_do_interrupt_handler()
- remove pointless cast to void*
- add KERN_xxx prefix to printk()
- remove braces around singleton C statement
- in tioce_provider.c, start tioce_dma_consistent() and
tioce_error_intr_handler() function declarations in column 0
This change's main purpose is to prepare for the patchset in
jgarzik/misc-2.6.git#irq-remove, that explores removal of the
never-used 'irq' argument in each interrupt handler.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There are many notify_die() and almost all take same style with
ia64_mca_spin(). This patch defines macros and replace them all,
to reduce lines and to improve readability.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There are 3 hooks in MCA handler, but this DIE_MCA_MONARCH_PROCESS
event does not notified other than for the first monarch.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While testing with CONFIG_VIRT_CPU_ACCOUNTING=y, I found that
I occasionally get very huge system time in some threads.
So I dug the issue and finally noticed that it was caused
because of an interrupt which interrupt in the following window:
> [arch/ia64/kernel/entry.S: (!CONFIG_PREEMPT && CONFIG_VIRT_CPU_ACCOUNTING)]
>
> ENTRY(ia64_leave_syscall)
> :
> (pUStk) rsm psr.i
> cmp.eq pLvSys,p0=r0,r0 // pLvSys=1: leave from syscall
> (pUStk) cmp.eq.unc p6,p0=r0,r0 // p6 <- pUStk
> .work_processed_syscall:
> adds r2=PT(LOADRS)+16,r12
> (pUStk) mov.m r22=ar.itc // fetch time at leave
> adds r18=TI_FLAGS+IA64_TASK_SIZE,r13
> ;;
> <<< window: from here >>>
> (p6) ld4 r31=[r18] // load current_thread_info()->flags
> ld8 r19=[r2],PT(B6)-PT(LOADRS)
> adds r3=PT(AR_BSPSTORE)+16,r12
> ;;
> mov r16=ar.bsp
> ld8 r18=[r2],PT(R9)-PT(B6)
> (p6) and r15=TIF_WORK_MASK,r31 // any work other than TIF_SYSCALL_TRACE?
> ;;
> ld8 r23=[r3],PT(R11)-PT(AR_BSPSTORE)
> (p6) cmp4.ne.unc p6,p0=r15, r0 // any special work pending?
> (p6) br.cond.spnt .work_pending_syscall
> ;;
> ld8 r9=[r2],PT(CR_IPSR)-PT(R9)
> ld8 r11=[r3],PT(CR_IIP)-PT(R11)
> (pNonSys) break 0 // bug check: we shouldn't be here if pNonSys is TRUE!
> ;;
> invala
> <<< window: to here >>>
> rsm psr.i | psr.ic // turn off interrupts and interruption collection
If pUStk is true, it means we are going to return user mode, hence we fetch
ar.itc to get time at leave from system.
It seems that it is not possible to interrupt the window if pUStk is true,
because interrupts are disabled early. And also disabling interrupt makes
sense because it is safe for referring current_thread_info()->flags.
However interrupting the window while pUStk is true was possible.
The route was:
ia64_trace_syscall
-> .work_pending_syscall_end
-> .work_processed_syscall
Only in case entering the window from this route, interrupts are enabled
during in the window even if pUStk is true. I suppose interrupts must be
disabled here anyway if pUStk is true.
I'm not sure but afraid that what kind of bad effect were there, other
than crazy system time which I found.
FYI, there was a commit 6f6d75825d that
points out a bug at same point(exit of ia64_trace_syscall) in 2006.
It can be said that there was an another bug.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Semaphores are no longer performance-critical, so a generic C
implementation is better for maintainability, debuggability and
extensibility. Thanks to Peter Zijlstra for fixing the lockdep
warning. Thanks to Harvey Harrison for pointing out that the
unlikely() was unnecessary.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
This patch fixes the problem that kdump by INIT does not work if we use
makedumpfile. The problem is that after INIT is issued, 2nd kernel
starts and makedumpfile fails with the following error message.
/proc/vmcore doesn't contain vmcoreinfo.
'-x' or '-i' must be specified.
makedumpfile Failed.
The cause of this problem is that kernel does not call
crash_save_vmcoreinfo. When kdump starts by panic or sysrq-trigger,
crash_save_vmcoreinfo is called by crash_kexec. But this function is not
called when kdump starts by INIT. The Attached patch fixes this.
This patch just adds crash_save_vmcoreinfo into machine_kdump_on_init so
that crash_save_vmcoreinfo can be called when kdump starts by INIT.
I tested this patch with linux-2.6.25-rc9 and I confirmed it worked.
Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There is a NUMA memory configuration issue in 2.6.24:
A 2-node machine of ours has got the following memory layout:
Node 0: 0 - 2 Gbytes
Node 0: 4 - 8 Gbytes
Node 1: 8 - 16 Gbytes
Node 0: 16 - 18 Gbytes
"efi_memmap_init()" merges the three last ranges into one.
"register_active_ranges()" is called as follows:
efi_memmap_walk(register_active_ranges, NULL);
i.e. once for the 4 - 18 Gbytes range. It picks up the node
number from the start address, and registers all the memory for
the node #0.
"register_active_ranges()" should be called as follows to
make sure there is no merged address range at its entry:
efi_memmap_walk(filter_memory, register_active_ranges);
"filter_memory()" is similar to "filter_rsvd_memory()",
but the reserved memory ranges are not filtered out.
Signed-off-by: Zoltan Menyhart <Zoltan.Menyhart@bull.net>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The functions time_before, time_before_eq, time_after, and time_after_eq are
more robust for comparing jiffies against other values.
So use the time_after() & time_before() macros, defined at linux/jiffies.h,
which deal with wrapping correctly
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: S.Caglar Onur <caglar@pardus.org.tr>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add kprobe-booster support on ia64.
Kprobe-booster improves the performance of kprobes by eliminating single-step,
where possible. Currently, kprobe-booster is implemented on x86 and x86-64.
This is an ia64 port.
On ia64, kprobe-booster executes a copied bundle directly, instead of single
stepping. Bundles which have B or X unit and which may cause an exception
(including break) are not executed directly. And also, to prevent hitting
break exceptions on the copied bundle, only the hindmost kprobe is executed
directly if several kprobes share a bundle and are placed in different slots.
Note: set_brl_inst() is used for preparing an instruction buffer(it does not
modify any active code), so it does not need any atomic operation.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: bibo,mao <bibo.mao@intel.com>
Cc: Rusty Lynch <rusty.lynch@intel.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The sys_getpid() and sys_set_tid_address() behavior changed from
return current->tgid
to
struct pid *pid;
pid = current->pids[PIDTYPE_PID].pid;
return pid->numbers[pid->level].nr;
But the fast system calls on ia64 still operate the old way. Patch them
appropriately to let ia64 work with pid namespaces. Besides, this is one more
step in deprecating of pid and tgid on task_struct.
The fsys_getppid() is to be patched as well, but its logic is much
more complex now, so I will make it later.
One thing I'm not 100% sure is the trick with the IA64_UPID_SHIFT. On order
to access the pid->level's element of an array I have to perform the following
calculations
pid + sizeof(struct upid) * pid->level
The problem is that ia64 can only multiply float point registers, while all
the offsets I have in code are in rXX ones. Fortunately, the sizeof(struct
upid) is 32 bytes on ia64 (and is very unlikely to ever change), so the
calculations get simpler:
pid + pid->level << 5
So, I introduce the IA64_UPID_SHIFT and use the shl instruction. I also
looked at how gcc compiles the similar place and found that it makes it with
shift as well. Is this OK to do so?
Tested with ski emulator with 2.6.24 kernel, but fits 2.6.25-rc4 and
2.6.25-rc4-mm1 as well.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: David Mosberger-Tang <davidm@hpl.hp.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
do_each_thread/while_each_thread is a double loop, so
should use 'goto' rather than 'break' to break out
the loop.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
One should normally unlock in the reverse order of the lock calls,
and in this case there certainly is no reason not to.
Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
While it is convenient that we can invoke kdump by asserting INIT
via button on chassis etc., there are some situations that invoking
kdump on fatal MCA is not welcomed rather than rebooting fast without
dump.
This patch adds a new flag 'kdump_on_fatal_mca' that is independent
from 'kdump_on_init' currently available. Adding this flag enable
us to turning on/off of kdump depend on the event, INIT and/or fatal
MCA. Default for this flag is to take the dump.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This attached patch significantly shrinks boot memory allocation on ia64.
It does this by not allocating per_cpu areas for cpus that can never
exist.
In the case where acpi does not have any numa node description of the
cpus, I defaulted to assigning the first 32 round-robin on the known
nodes.. For the !CONFIG_ACPI I used for_each_possible_cpu().
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The patch defines kernel parameter "nptcg=". The parameter overrides max number
of concurrent global TLB purges which is reported from either PAL_VM_SUMMARY or
SAL PALO.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
According to SDM2.2, Itanium supports multiple outstanding ptc.g instructions.
But current kernel function ia64_global_tlb_purge() uses a spinlock to serialize
ptc.g instructions issued by multiple processors. This serialization might have
scalability issue on a big SMP machine where many processors could purge TLB
in parallel.
The patch fixes this problem by issuing multiple ptc.g instructions in
ia64_global_tlb_purge(). It also adds support for the "PALO" table to get
a platform view of the max number of outstanding ptc.g instructions (which
may be different from the processor view found from PAL_VM_SUMMARY).
PALO specification can be found at: http://www.dig64.org/home/DIG64_PALO_R1_0.pdf
spinaphore implementation by Matthew Wilcox.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This interface provides more flexible functionality for smp
infrastructure ... e.g. KVM frequently needs to operate on
a subset of cpus.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Dynamic TR resource should be managed in the uniform way.
Add two interfaces for kernel:
ia64_itr_entry: Allocate a (pair of) TR for caller.
ia64_ptr_entry: Purge a (pair of ) TR by caller.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
We have duplicate code to access registers (access_uarea and regset
way). They just have different layout, so remove duplicate code.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
After we have regset support, we can use CORE_DUMP_USE_REGSET.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is the 32-bit regset implementation under IA64. Basically register
read/write, which is derived from current ptrace register read/write.
This version added TLS support.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is the 64-bit regset implementation under IA64. Basically register
read/write, which is derived from current ptrace register read/write.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch does:
- Remove outdated comments (which someday I marked with "?").
- Reassemble instructions to fit them in fewer bundles.
- If McKinley Errata 9 workaround is not needed, the workaround
bundles will be patched out with NOPs. However it also not
needed to have a totally NOP bundle (nop * 3) before branch.
As a result, this makes the code path 3 (or 2) bundles shorter
(and remove 1 unnecessary stop bit). It seems to be 1% faster.
(10sec loop test, with nojitter @ Madison 1.5GHz x 4)
Before:
CPU 0: 0.14 (usecs) (0 errors / 69598875 iterations)
CPU 1: 0.14 (usecs) (0 errors / 69630721 iterations)
CPU 2: 0.14 (usecs) (0 errors / 69607850 iterations)
CPU 3: 0.14 (usecs) (0 errors / 69619832 iterations)
After:
CPU 0: 0.14 (usecs) (0 errors / 70257728 iterations)
CPU 1: 0.14 (usecs) (0 errors / 70309498 iterations)
CPU 2: 0.14 (usecs) (0 errors / 70280639 iterations)
CPU 3: 0.14 (usecs) (0 errors / 70260682 iterations)
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
ia64 named their handler kprobes_fault_handler while all other
arches used kprobe_fault_handler. Change the function definition
and header declaration.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When EFI_DEBUG is defined to a non-zero value in arch/ia64/kernel/efi.c,
the efi memory regions are displayed. This patch enhances the
display code in a few ways:
1. Use TB, GB and MB as well as KB as units.
Although this introduces rounding errors (KB doesn't as
size is always a multiple of 4Kb), it does make
things a lot more readable.
Also as the range is also shown, it is possible to note the exact size
if it is important. In my experience, the size field is mostly useful
for getting a general idea of the size of a region.
On the rx2620 that I use, there actually is an 8TB region (though not
backed by physical memory, and 8TB really is a lot more readable than
8589934592KB.
2. pad the size field with leading spaces to further improve readability
...
... ( 8MB)
... ( 928MB)
... ( 3MB)
...
vs
...
... (8MB)
... (928MB)
... (3MB)
...
3. Pad the attr field out to 64bits using leading zeros,
to further improve readability.
...
mem05: type= 2, attr=0x0000000000000008, range=[0x0000000004000000-0x000000000481f000) ( 8MB)
mem06: type= 7, attr=0x0000000000000008, range=[0x000000000481f000-0x000000003e876000) ( 928MB)
mem07: type= 5, attr=0x8000000000000008, range=[0x000000003e876000-0x000000003eb8e000) ( 3MB)
mem08: type= 4, attr=0x0000000000000008, range=[0x000000003eb8e000-0x000000003ee7a000) ( 2MB)
...
...
mem05: type= 2, attr=0x8, range=[0x0000000004000000-0x000000000481f000) ( 8MB)
mem06: type= 7, attr=0x8, range=[0x000000000481f000-0x000000003e876000) ( 928MB)
mem07: type= 5, attr=0x8000000000000008, range=[0x000000003e876000-0x000000003eb8e000) ( 3MB)
mem08: type= 4, attr=0x8, range=[0x000000003eb8e000-0x000000003ee7a000) ( 2MB)
...
4. Use %d instead of %u for the index field, as i is a signed int.
N.B: This code is not compiled unless EFI_DEBUG is non 0.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
__FUNCTION__ is gcc-specific, use __func__
Long lines have been kept where they exist, some small spacing changes
have been done.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When !CONFIG_SMP, cpu_physical_id() is ia64_get_lid(), which is
functionally identical to
(ia64_getreg(_IA64_REG_CR_LID) >> 16) & 0xffff
so there's no need for two versions of this code.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove all code which does exactly the same thing as ptrace_request().
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Tony Luck <tony.luck@intel.com>
find_thread_for_addr() is no longer needed. It was only used to find
the correct kernel RBS for a given memory address, but since the kernel
RBS is not needed any longer, this function can go away.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Syncing is no longer needed, because user RBS is already
up-to-date. Actually, if a debugger modified the contents
of the original RBS prior to changing PT_AR_BSP, the
modifications would get overwritten.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Because the user RBS of a process is now completely stored in
user-mode when the process is ptrace-stopped, accesses to the
RBS should no longer augment any part of the kernel RBS.
This means we can get rid of most ia64_peek() and ia64_poke()
calls.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch fixes the following compile error with a recent gcc:
CC kernel/kprobes.o
/home/bunk/linux/kernel-2.6/git/linux-2.6/kernel/kprobes.c:1066: error: __ksymtab_jprobe_return causes a section type conflict
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This fixes regression introduced in 113134fcbc
Intel Tiger platforms hang when calling SAL_GET_PHYSICAL_ID_INFO
instead of properly returning -1 for unimplemented, so add a
version check.
SGI Altix platforms have an incorrect SAL version hard-coded into
their prom -- they encode 2.9, but actually implement 3.2 -- so
fix it up and allow ia64_sal_get_physical_id_info to keep
working.
Signed-off-by: Alex Chiang <achiang@hp.com>
Acked-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the problem that the following error message is sometimes displayed
at irq migration when vector domain is enabled.
"Unexpected interrupt vector %d on CPU %d is not mapped to any IRQ!"
The cause of this problem is an interrupt is sent to the previous
target CPU after cleaning up vector to irq mapping table. To clean up
vector to irq map on the previous target CPU safty, change the irq
migration in multiple vector domain as follows. The original idea is
from x86 interrupt management code.
- Delay vector to irq table cleanup until the interrupts are sent
to new target CPUs. By this, it is ensured that target CPU is
completely changed on the interrupt controller side.
- Even after the interrupts are sent to new target CPUs, there can
be pended interrupts remaining on the previous target CPU. So we
need to delay clearning up vector to irq table until the pended
interrupt is handled. For this, send IPI to the previous target
CPU with lower priority vector and clean up vector to irq table
in its handler.
This patch affects only to irq migration code with multiple vector
domain is enabled.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The similar check has been added to x86_32(i386) in commit
id 83bd01024b.
So we add this check to ia64 and improve it a liitle bit in that
we need to check for stack overflow only when the signal is on stack.
Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch implements VIRT_CPU_ACCOUNTING for ia64,
which enable us to use more accurate cpu time accounting.
The VIRT_CPU_ACCOUNTING is an item of kernel config, which s390
and powerpc arch have. By turning this config on, these archs
change the mechanism of cpu time accounting from tick-sampling
based one to state-transition based one.
The state-transition based accounting is done by checking time
(cycle counter in processor) at every state-transition point,
such as entrance/exit of kernel, interrupt, softirq etc.
The difference between point to point is the actual time consumed
during in the state. There is no doubt about that this value is
more accurate than that of tick-sampling based accounting.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Fix large MCA bootmem allocation
[IA64] Simplify cpu_idle_wait
[IA64] Synchronize RBS on PTRACE_ATTACH
[IA64] Synchronize kernel RSE to user-space and back
[IA64] Rename TIF_PERFMON_WORK back to TIF_NOTIFY_RESUME
[IA64] Wire up timerfd_{create,settime,gettime} syscalls
The MCA code allocates bootmem memory for NR_CPUS, regardless
of how many cpus the system actually has. This change allocates
memory only for cpus that actually exist.
On my test system with NR_CPUS = 1024, reserved memory was reduced by 130944k.
Before: Memory: 27886976k/28111168k available (8282k code, 242304k reserved, 5928k data, 1792k init)
After: Memory: 28017920k/28111168k available (8282k code, 111360k reserved, 5928k data, 1792k init)
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is just Venki's patch[*] for x86 ported to ia64.
* http://marc.info/?l=linux-kernel&m=120249201318159&w=2
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When attaching to a stopped process, the RSE must be explicitly
synced to user-space, so the debugger can read the correct values.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
CC: Roland McGrath <roland@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is base kernel patch for ptrace RSE bug. It's basically a backport
from the utrace RSE patch I sent out several weeks ago. please review.
when a thread is stopped (ptraced), debugger might change thread's user
stack (change memory directly), and we must avoid the RSE stored in
kernel to override user stack (user space's RSE is newer than kernel's
in the case). To workaround the issue, we copy kernel RSE to user RSE
before the task is stopped, so user RSE has updated data. we then copy
user RSE to kernel after the task is resummed from traced stop and
kernel will use the newer RSE to return to user.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
CC: Roland McGrath <roland@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Since the RSE synchronization will need a TIF_ flag, but all
work-to-be-done bits are already used, so we have to multiplex
TIF_NOTIFY_RESUME again.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix typo in comments.
BTW: I have to fix coding style in arch/ia64/kernel/time.c also, otherwise
checkpatch.pl will be complaining.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes the configuration dependencies in the vmcoreinfo data.
i386's "node_data" is defined in arch/x86/mm/discontig_32.c,
and x86_64's one is defined in arch/x86/mm/numa_64.c.
They depend on CONFIG_NUMA:
arch/x86/mm/Makefile_32:7
obj-$(CONFIG_NUMA) += discontig_32.o
arch/x86/mm/Makefile_64:7
obj-$(CONFIG_NUMA) += numa_64.o
ia64's "pgdat_list" is defined in arch/ia64/mm/discontig.c,
and it depends on CONFIG_DISCONTIGMEM and CONFIG_SPARSEMEM:
arch/ia64/mm/Makefile:9-10
obj-$(CONFIG_DISCONTIGMEM) += discontig.o
obj-$(CONFIG_SPARSEMEM) += discontig.o
ia64's "node_memblk" is defined in arch/ia64/mm/numa.c,
and it depends on CONFIG_NUMA:
arch/ia64/mm/Makefile:8
obj-$(CONFIG_NUMA) += numa.o
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Acked-by: Simon Horman <horms@verge.net.au>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset is for the vmcoreinfo data.
The vmcoreinfo data has the minimum debugging information only for dump
filtering. makedumpfile (dump filtering command) gets it to distinguish
unnecessary pages, and makedumpfile creates a small dumpfile.
This patch:
VMCOREINFO_SIZE() should be renamed VMCOREINFO_STRUCT_SIZE() since it's always
returning the size of the struct with a given name. This change would allow
VMCOREINFO_TYPEDEF_SIZE() to simply become VMCOREINFO_SIZE() since it need not
be used exclusively for typedefs.
This discussion is the following:
http://www.ussg.iu.edu/hypermail/linux/kernel/0709.3/0582.html
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
calibrate_delay() must be __cpuinit, not __{dev,}init.
I've verified that this is correct for all users.
While doing the latter, I also did the following cleanups:
- remove pointless additional prototypes in C files
- ensure all users #include <linux/delay.h>
This fixes the following section mismatches with CONFIG_HOTPLUG=n,
CONFIG_HOTPLUG_CPU=y:
WARNING: vmlinux.o(.text+0x1128d): Section mismatch: reference to .init.text.1:calibrate_delay (between 'check_cx686_slop' and 'set_cx86_reorder')
WARNING: vmlinux.o(.text+0x25102): Section mismatch: reference to .init.text.1:calibrate_delay (between 'smp_callin' and 'cpu_coregroup_map')
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christian Zankel <chris@zankel.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] make pfm_get_task work with virtual pids
[IA64] honor notify_die() returning NOTIFY_STOP
[IA64] remove dead code: __cpu_{down,die} from !HOTPLUG_CPU
[IA64] Appoint kvm/ia64 Maintainers
[IA64] ia64_set_psr should use srlz.i
[IA64] Export three symbols for module use
[IA64] mca style cleanup
[IA64] sn_hwperf semaphore to mutex
[IA64] generalize attribute of fsyscall_gtod_data
[IA64] efi.c Add /* never reached */ annotation
[IA64] efi.c Spelling/punctuation fixes
[IA64] Make efi.c mostly fit in 80 columns
[IA64] aliasing-test: fix gcc warnings on non-ia64
[IA64] Slim-down __clear_bit_unlock
[IA64] Fix the order of atomic operations in restore_previous_kprobes on ia64
[IA64] constify function pointer tables
[IA64] fix userspace compile error in gcc_intrin.h
This is the new timerfd API as it is implemented by the following patch:
int timerfd_create(int clockid, int flags);
int timerfd_settime(int ufd, int flags,
const struct itimerspec *utmr,
struct itimerspec *otmr);
int timerfd_gettime(int ufd, struct itimerspec *otmr);
The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.
The timerfd_settime() API give new settings by the timerfd fd, by optionally
retrieving the previous expiration time (in case the "otmr" parameter is not
NULL).
The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
is set in the "flags" parameter. Otherwise it's a relative time.
The timerfd_gettime() API returns the next expiration time of the timer, or
{0, 0} if the timerfd has not been set yet.
Like the previous timerfd API implementation, read(2) and poll(2) are
supported (with the same interface). Here's a simple test program I used to
exercise the new timerfd APIs:
http://www.xmailserver.org/timerfd-test2.c
[akpm@linux-foundation.org: coding-style cleanups]
[akpm@linux-foundation.org: fix ia64 build]
[akpm@linux-foundation.org: fix m68k build]
[akpm@linux-foundation.org: fix mips build]
[akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
[heiko.carstens@de.ibm.com: fix s390]
[akpm@linux-foundation.org: fix powerpc build]
[akpm@linux-foundation.org: fix sparc64 more]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This pid comes from user space, so treat it accordingly.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This requires making die() and die_if_kernel() return a value, and their
callers to honor this (and be prepared that it returns).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Neither __cpu_down() nor __cpu_die() are being referenced without
CONFIG_HOTPLUG_CPU.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The only in kernel use of ia64_set_psr() needs to follow
it with a srlz.i (since it is changing state for PSR.ic).
So it is pointless to issue srlz.d inside this function.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Since kvm/module needs to use some unexported functions in kernel,
so export them with this patch.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
In an ordinary way,
> } __attribute__ ((aligned (L1_CACHE_BYTES)));
should be
> } ____cacheline_aligned;
to save some bytes on an uni-processor.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
As written, this loop could be for (;;) instead of do while (md). The tests
inside the loop always result in a return so the loop never terminates normally.
Signed-off-by: Aron Griffis <aron@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Incorporates the suggestions from Peter Chubb the last time I submitted
this. This called for using the same verb tense in the couple of preceding
comments as well.
Signed-off-by: Aron Griffis <aron@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch is purely whitespace changes to make the code fit in 80
columns, plus fix some inconsistent indentation. The efi_guidcmp()
tests remain wider than 80-columns since that seems to be the most
clear.
Signed-off-by: Aron Griffis <aron@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the order of atomic operations to prevent overwriting prev_kprobe[0].
To pop values from stack, we must decrement stack index right AFTER
reading values.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The ACPI_PDC_SMP_T_SWCOORD bit is set by and OS that is capable of
native ACPI throttling software coordination for mutli-processors
using the _TSD information.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits)
Remove commented-out code copied from NFS
NFS: Switch from intr mount option to TASK_KILLABLE
Add wait_for_completion_killable
Add wait_event_killable
Add schedule_timeout_killable
Use mutex_lock_killable in vfs_readdir
Add mutex_lock_killable
Use lock_page_killable
Add lock_page_killable
Add fatal_signal_pending
Add TASK_WAKEKILL
exit: Use task_is_*
signal: Use task_is_*
sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL
ptrace: Use task_is_*
power: Use task_is_*
wait: Use TASK_NORMAL
proc/base.c: Use task_is_*
proc/array.c: Use TASK_REPORT
perfmon: Use task_is_*
...
Fixed up conflicts in NFS/sunrpc manually..
percpu_modcopy() is defined multiple times in arch files. However, the only
user is module.c. Put a static definition into module.c and remove
the definitions from the arch files.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- add support for PER_CPU_ATTRIBUTES
- fix generic smp percpu_modcopy to use per_cpu_offset() macro.
Add the ability to use generic/percpu even if the arch needs to override
several aspects of its operations. This will enable the use of generic
percpu.h for all arches.
An arch may define:
__per_cpu_offset Do not use the generic pointer array. Arch must
define per_cpu_offset(cpu) (used by x86_64, s390).
__my_cpu_offset Can be defined to provide an optimized way to determine
the offset for variables of the currently executing
processor. Used by ia64, x86_64, x86_32, sparc64, s/390.
SHIFT_PTR(ptr, offset) If an arch defines it then special handling
of pointer arithmentic may be implemented. Used
by s/390.
(Some of these special percpu arch implementations may be later consolidated
so that there are less cases to deal with.)
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch consolidate all definitions of .init.text, .init.data
and .exit.text, .exit.data section definitions in
the generic vmlinux.lds.h.
This is a preparational patch - alone it does not buy
us much good.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
There is no need for kobject_unregister() anymore, thanks to Kay's
kobject cleanup changes, so replace all instances of it with
kobject_put().
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Stop using kobject_register, as this way we can control the sending of
the uevent properly, after everything is properly initialized.
Cc: Tony Luck <tony.luck@intel.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The compiler team did the hard work for this distilling a problem in
large fortran application which showed up when applied to a 290MB input
data set down to this instruction:
ldfd f34=[r17],-8
Which they noticed incremented r17 by 0x10 rather than decrementing it
by 8 when the value in r17 caused an unaligned data fault. I tracked
it down to some bad instruction decoding in unaligned.c. The code
assumes that the 'x' bit can determine whether the instruction is
an "ldf" or "ldfp" ... which it is for opcode=6 (see table 4-29 on
page 3:302 of the SDM). But for opcode=7 the 'x' bit is irrelevent,
all variants are "ldf" instructions (see table 4-36 on page 3:306).
Note also that interpreting the instruction as "ldfp" means that the
"paired" floating point register (f35 in the example here) will also
be corrupted.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Currently CMCI mask of hot-added CPU is always disabled after CPU hotplug.
We should adjust this mask depending on CMC polling state.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This fixes an unused variable warning in mm/vmalloc.c.
Tony: also fix resulting fallout in uncached.c with a
typo in args to flush_tlb_kernel_range().
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes the following assembler warning messages.
AS arch/ia64/kernel/head.o
arch/ia64/kernel/head.S: Assembler messages:
arch/ia64/kernel/head.S:1179: Warning: Use of 'ld8' violates RAW dependency 'CR[PTA]' (data)
arch/ia64/kernel/head.S:1179: Warning: Only the first path encountering the conflict is reported
arch/ia64/kernel/head.S:1178: Warning: This is the location of the conflicting usage
arch/ia64/kernel/head.S:1180: Warning: Use of 'ld8' violates RAW dependency 'CR[PTA]' (data)
arch/ia64/kernel/head.S:1180: Warning: Only the first path encountering the conflict is reported
arch/ia64/kernel/head.S:1178: Warning: This is the location of the conflicting usage
:
arch/ia64/kernel/head.S:1213: Warning: Use of 'ldf.fill.nta' violates RAW dependency 'CR[PTA]' (data)
arch/ia64/kernel/head.S:1213: Warning: Only the first path encountering the conflict is reported
arch/ia64/kernel/head.S:1178: Warning: This is the location of the conflicting usage
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes the following compiler warning messages.
CC arch/ia64/kernel/irq_ia64.o
arch/ia64/kernel/irq_ia64.c: In function 'create_irq':
arch/ia64/kernel/irq_ia64.c:343: warning: 'domain.bits[0u]' may be used uninitialized in this function
arch/ia64/kernel/irq_ia64.c: In function 'assign_irq_vector':
arch/ia64/kernel/irq_ia64.c:203: warning: 'domain.bits[0u]' may be used uninitialized in this function
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
I tried to upgrade an IA32 chroot on my IA64 to a new glibc with TLS.
It kept dying because set_thread_area was returning -ESRCH
(bugs.debian.org/451939).
I instrumented arch/ia64/ia32/sys_ia32.c:get_free_idx() and ended up
seeing output like
[pid] idx desc->a desc->b
-----------------------------
[2710] 0 -> c6b0ffff 40dff31b
[2710] 1 -> 0 0
[2710] 2 -> 0 0
[2710] 0 -> c6b0ffff 40dff31b
[2710] 1 -> c6b0ffff 40dff31b
[2710] 2 -> 0 0
[2711] 0 -> c6b0ffff 40dff31b
[2711] 1 -> c6b0ffff 40dff31b
[2711] 2 -> 48c0ffff 40dff317
which suggested to me that TLS pointers were surviving exec() calls,
leading to GDT pointers filling up and the eventual failure of
get_free_idx().
I think the solution is flushing the tls array on exec.
Signed-Off-By: Ian Wienand <ianw@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The ia64 oops message doesn't include the kernel version, which
makes it hard to automatically categorize oops messages scraped
from mailing lists and bug databases.
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch removes some redundant code in the function setup_sigcontext().
The registers ar.ccv,b7,r14,ar.csd,ar.ssd,r2-r3 and r16-r31 are not
restored in restore_sigcontext() when (flags & IA64_SC_FLAG_IN_SYSCALL) is
true. So we don't need to zero those variables in setup_sigcontext().
Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
ACPI tables follow a tree structure in memory.
The root of the tree is the RSDP (Root System Description Pointer).
To find the RSDP, the OS searches for the signature "RSD PTR "
in well known physical memory locations. Then the OS computes
a table checksum to verify that the signature is really part
of a valid table header.
Some systems have a proper signature but an invalid checksum;
followed elsewhere by a proper signature with valid checksum.
http://bugzilla.kernel.org/show_bug.cgi?id=9444
The Linux RSDP scanning code bailed out on those systems
and as a result they booted with ACPI disabled.
Fix this by deleting the Linux RSDP scanning code and
plugging in the ACPICA RSDP scanning code.
Signed-off-by: Len Brown <len.brown@intel.com>
.. as it it used only during early boot.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
arch/ia64/kernel/acpi.c | 2 +-
arch/x86/kernel/acpi/boot.c | 4 ++--
drivers/acpi/osl.c | 3 ++-
3 files changed, 5 insertions(+), 4 deletions(-)
Signed-off-by: Len Brown <len.brown@intel.com>
If "CPEI Processor Override" bit is not set in "Platform Interrupt
Source Flags" in "Platform Interrupt Sources Structure" in ACPI MADT,
the target processor of CPEI is restricted to a specific CPU. Because
of this, the delivery mode for CPEI should be IOSAPIC_FIXED.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Restore regs->ccr_iip before kreturn probe handler runs. In this way, if
probe handler does unwind, unwind can correctly get the stack trace.
Fixes: http://sourceware.org/bugzilla/show_bug.cgi?id=5051
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
'!' has a higher priority than '&', so as was
this won't test the first bit, but rather evaluates to false for any non-zero
lsapic->lapic_flags.
Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Macro efi_md_size is defined in efi.c, and here we apply it throughout
efi.c.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Rename _bss to __bss_start as on other architectures. That makes it
possible to use the <linux/sections.h> instead of own declarations. Also
add __bss_stop because that symbol exists on other architectures.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Make some IOSAPIC functions static and remove one that is unused.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Not all the return value of __copy_from_user and
__put_user is checked.This patch fixed it.
Signed-off-by: Shi Weihua <shiwh@cn.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
With the unionfs patch applied I get
ERROR: "copy_page" [fs/unionfs/unionfs.ko] undefined!
the other architectures (some, at least) export copy_page() so I guess ia64
should also do so.
To do this we need to move the copy_page() functions out of lib.a and into
built-in.o and add the EXPORT_SYMBOL().
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
/opt/crosstool/gcc-3.4.5-glibc-2.3.6/ia64-unknown-linux-gnu/lib/gcc/ia64-unknown-linux-gnu/3.4.5/../../../../ia64-unknown-linux-gnu/bin/ld: section .data.patch [a000000000000500 -> a000000000000507] overlaps section .dynamic [a0000000000003c8 -> a000000000000507]
This only appears to be a problem with strangely configured
cross-compilation ... native compilers don't have this issue.
But in the interests of helping others at least compile for
ia64, this can go in. -Tony
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
i386 and x86-64 registers System RAM as IORESOURCE_MEM | IORESOURCE_BUSY.
But ia64 registers it as IORESOURCE_MEM only.
In addition, memory hotplug code registers new memory as IORESOURCE_MEM too.
This difference causes a failure of memory unplug of x86-64. This patch
fixes it.
This patch adds IORESOURCE_BUSY to avoid potential overlap mapping by PCI
device.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Luck, Tony" <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On Altix (sn2) machines the "Error parsing MADT" message is
misleading because the lack of IOSAPIC entries is expected.
Since I am sure someone will ask, I have been told that
the chance of this changing anytime soon is close to nil.
Signed-off-by: George Beshers <gbeshers@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>