If we want to stop the tick further idle, we need to be
able to account the cputime without using the tick.
Virtual based cputime accounting solves that problem by
hooking into kernel/user boundaries.
However implementing CONFIG_VIRT_CPU_ACCOUNTING require
low level hooks and involves more overhead. But we already
have a generic context tracking subsystem that is required
for RCU needs by archs which plan to shut down the tick
outside idle.
This patch implements a generic virtual based cputime
accounting that relies on these generic kernel/user hooks.
There are some upsides of doing this:
- This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
if context tracking is already built (already necessary for RCU in full
tickless mode).
- We can rely on the generic context tracking subsystem to dynamically
(de)activate the hooks, so that we can switch anytime between virtual
and tick based accounting. This way we don't have the overhead
of the virtual accounting when the tick is running periodically.
And one downside:
- There is probably more overhead than a native virtual based cputime
accounting. But this relies on hooks that are already set anyway.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
commit 74a622b (ia64: vsyscall: Use seqcount instead of seqlock) broke
the ia64 build.
Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The update of the vdso data happens under xtime_lock, so adding a
nested lock is pointless. Just use a seqcount to sync the readers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
paravirtualize ar.itc and ar.itm in order to support save/restore.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch enables elf note based xen startup for IA-64, which gives the
kernel an early hint for running on xen like x86 case.
In order to avoid the multi entry point, presumably extending booting
protocol(i.e. extending struct ia64_boot_param) would be necessary.
It probably means that elilo also needs modification.
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
After moving the the include files there were a few clean-ups:
1) Some files used #include <asm-ia64/xyz.h>, changed to <asm/xyz.h>
2) Some comments alerted maintainers to look at various header files to
make matching updates if certain code were to be changed. Updated these
comments to use the new include paths.
3) Some header files mentioned their own names in initial comments. Just
deleted these self references.
Signed-off-by: Tony Luck <tony.luck@intel.com>
The sys_getpid() and sys_set_tid_address() behavior changed from
return current->tgid
to
struct pid *pid;
pid = current->pids[PIDTYPE_PID].pid;
return pid->numbers[pid->level].nr;
But the fast system calls on ia64 still operate the old way. Patch them
appropriately to let ia64 work with pid namespaces. Besides, this is one more
step in deprecating of pid and tgid on task_struct.
The fsys_getppid() is to be patched as well, but its logic is much
more complex now, so I will make it later.
One thing I'm not 100% sure is the trick with the IA64_UPID_SHIFT. On order
to access the pid->level's element of an array I have to perform the following
calculations
pid + sizeof(struct upid) * pid->level
The problem is that ia64 can only multiply float point registers, while all
the offsets I have in code are in rXX ones. Fortunately, the sizeof(struct
upid) is 32 bytes on ia64 (and is very unlikely to ever change), so the
calculations get simpler:
pid + pid->level << 5
So, I introduce the IA64_UPID_SHIFT and use the shl instruction. I also
looked at how gcc compiles the similar place and found that it makes it with
shift as well. Is this OK to do so?
Tested with ski emulator with 2.6.24 kernel, but fits 2.6.25-rc4 and
2.6.25-rc4-mm1 as well.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: David Mosberger-Tang <davidm@hpl.hp.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Amy Griffis <amy.griffis@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch implements VIRT_CPU_ACCOUNTING for ia64,
which enable us to use more accurate cpu time accounting.
The VIRT_CPU_ACCOUNTING is an item of kernel config, which s390
and powerpc arch have. By turning this config on, these archs
change the mechanism of cpu time accounting from tick-sampling
based one to state-transition based one.
The state-transition based accounting is done by checking time
(cycle counter in processor) at every state-transition point,
such as entrance/exit of kernel, interrupt, softirq etc.
The difference between point to point is the actual time consumed
during in the state. There is no doubt about that this value is
more accurate than that of tick-sampling based accounting.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is a merge of Peter Keilty's initial patch (which was
revived by Bob Picco) for this with Hidetoshi Seto's fixes
and scaling improvements.
Acked-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
On 1.6GHz Montectio Tiger4, the following performance data is measured with
kernel built with defconfig which has NUMA configured:
Fastest sys_getcpu: 502 itc counts.
Fastest fsys_getcpu: 28 itc counts.
fsys_getcpu performance is largly impacted by whether data (node_to_cpu_map
etc) is in cache. It can take fsys_getcpu up to ~150 itc counts in cold
cache case.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
struct ia64_sal_os_state has three semi-independent sections. The code
in mca_asm.S assumes that these three sections are contiguous, which
makes it very awkward to add new data to this structure. Remove the
assumption that the sections are contiguous. Define a macro to shorten
references to offsets in ia64_sal_os_state.
This patch does not change the way that the code behaves. It just
makes it easier to update the code in future and to add fields to
ia64_sal_os_state when debugging the MCA/INIT handlers.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The bulk of the change. Use per cpu MCA/INIT stacks. Change the SAL
to OS state (sos) to be per process. Do all the assembler work on the
MCA/INIT stacks, leaving the original stack alone. Pass per cpu state
data to the C handlers for MCA and INIT, which also means changing the
mca_drv interfaces slightly. Lots of verification on whether the
original stack is usable before converting it to a sleeping process.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!