linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-21 18:07:21 +07:00

Author	SHA1	Message	Date
Luis R. Rodriguez	e55645ec57	ia64: remove paravirt code All the ia64 pvops code is now dead code since both xen and kvm support have been ripped out [0] [1]. Just that no one had troubled to rip this stuff out. The only useful remaining pieces were the old pvops docs but that was recently also generalized and moved out from ia64 [2]. This has been run time tested on an ia64 Madison system. [0] `003f7de625` "KVM: ia64: remove" since v3.19-rc1 [1] `d52eefb47d` "ia64/xen: Remove Xen support for ia64" since v3.14-rc1 [2] "virtual: Documentation: simplify and generalize paravirt_ops.txt" Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2015-06-10 14:26:32 -07:00
Eric W. Biederman	deb6001509	Fix broken fsys_getppid() In particular fsys_getppid always returns the ppid in the initial pid namespace so it does not work for a process in a pid namespace. Fix from Eric Biederman just removes the fast system call path. While it is a little bit sad to see another one of these bite the dust ... I can't imagine that getppid() is really on any real applications critical path. Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-03-19 16:14:52 -07:00
Frederic Weisbecker	abf917cd91	cputime: Generic on-demand virtual cputime accounting If we want to stop the tick further idle, we need to be able to account the cputime without using the tick. Virtual based cputime accounting solves that problem by hooking into kernel/user boundaries. However implementing CONFIG_VIRT_CPU_ACCOUNTING require low level hooks and involves more overhead. But we already have a generic context tracking subsystem that is required for RCU needs by archs which plan to shut down the tick outside idle. This patch implements a generic virtual based cputime accounting that relies on these generic kernel/user hooks. There are some upsides of doing this: - This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING if context tracking is already built (already necessary for RCU in full tickless mode). - We can rely on the generic context tracking subsystem to dynamically (de)activate the hooks, so that we can switch anytime between virtual and tick based accounting. This way we don't have the overhead of the virtual accounting when the tick is running periodically. And one downside: - There is probably more overhead than a native virtual based cputime accounting. But this relies on hooks that are already set anyway. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>	2013-01-27 19:23:27 +01:00
Andi Kleen	4035c6db5a	[IA64] Liberate the signal layer from IA64 assembler Currently IA64 has a assembler implementation of sigrtprocmask. Having a single architecture implement this in assembler language is a serious maintenance problem that inhibits further evolution of the signal subsystem. Everyone who wants to do deep changes to signals would need to learn that assembler language. Whatever performance improvements IA64 gets from this it cannot be worth the price in maintainability. We have some locking problems in signal that need to be fixed, but this roadblock needs to be removed first. So just disable the special assembler IA64 implementation and fall back to a normal syscall there. Acked-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Tony Luck <tony.luck@intel.com>	2012-05-16 14:09:55 -07:00
Tony Luck	7411d89535	[IA64] Fix fast syscall version of getcpu() GETCPU(2) says: int getcpu(unsigned cpu, unsigned node, struct getcpu_cache *tcache); ... When either cpu or node is NULL nothing is written to the respective pointer. But the fast system call path had no checks for NULL, and would thus return -EFAULT if either (or both) of these were NULL. Reported-by: Mike Frysinger <vapier@gentoo.org> Tested-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Tony Luck <tony.luck@intel.com>	2012-05-16 13:58:29 -07:00
Linus Torvalds	bcd550745f	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner. * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ia64: vsyscall: Add missing paranthesis alarmtimer: Don't call rtc_timer_init() when CONFIG_RTC_CLASS=n x86: vdso: Put declaration before code x86-64: Inline vdso clock_gettime helpers x86-64: Simplify and optimize vdso clock_gettime monotonic variants kernel-time: fix s/then/than/ spelling errors time: remove no_sync_cmos_clock time: Avoid scary backtraces when warning of > 11% adj alarmtimer: Make sure we initialize the rtctimer ntp: Fix leap-second hrtimer livelock x86, tsc: Skip refined tsc calibration on systems with reliable TSC rtc: Provide flag for rtc devices that don't support UIE ia64: vsyscall: Use seqcount instead of seqlock x86: vdso: Use seqcount instead of seqlock x86: vdso: Remove bogus locking in update_vsyscall_tz() time: Remove bogus comments time: Fix change_clocksource locking time: x86: Fix race switching from vsyscall to non-vsyscall clock	2012-03-29 14:16:48 -07:00
David Howells	c140d87995	Disintegrate asm/system.h for IA64 Disintegrate asm/system.h for IA64. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Tony Luck <tony.luck@intel.com> cc: linux-ia64@vger.kernel.org	2012-03-28 18:30:02 +01:00
Thomas Gleixner	74a622be3d	ia64: vsyscall: Use seqcount instead of seqlock The update of the vdso data happens under xtime_lock, so adding a nested lock is pointless. Just use a seqcount to sync the readers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2012-03-15 18:17:59 -07:00
Petr Tesarik	2d2b690164	[IA64] Optimize ticket spinlocks in fsys_rt_sigprocmask Tony's fix (`f574c84319`) has a small bug, it incorrectly uses "r3" as a scratch register in the first of the two unlock paths ... it is also inefficient. Optimize the fast path again. Signed-off-by: Petr Tesarik <ptesarik@suse.cz> Signed-off-by: Tony Luck <tony.luck@intel.com>	2010-09-15 15:35:48 -07:00
Tony Luck	f574c84319	[IA64] fix siglock When ia64 converted to using ticket locks, an inline implementation of trylock/unlock in fsys.S was missed. This was not noticed because in most circumstances it simply resulted in using the slow path because the siglock was apparently not available (under old spinlock rules). Problems occur when the ticket spinlock has value 0x0 (when first initialised, or when it wraps around). At this point the fsys.S code acquires the lock (changing the 0x0 to 0x1. If another process attempts to get the lock at this point, it will change the value from 0x1 to 0x2 (using new ticket lock rules). Then the fsys.S code will free the lock using old spinlock rules by writing 0x0 to it. From here a variety of bad things can happen. Signed-off-by: Tony Luck <tony.luck@intel.com>	2010-09-09 15:16:56 -07:00
Isaku Yamahata	94752a794d	ia64/pv_ops: paravirtualize mov = ar.itc. paravirtualize mov reg = ar.itc. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Tony Luck <tony.luck@intel.com>	2009-03-26 10:50:22 -07:00
Isaku Yamahata	84b8857a03	ia64/pv_ops: paravirtualize fsys.S. paravirtualize fsys.S. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Tony Luck <tony.luck@intel.com>	2009-03-26 10:50:01 -07:00
Isaku Yamahata	dd97d5cb54	ia64/pv_ops: add hooks to paravirtualize fsyscall implementation. Add two hooks, paravirt_get_fsyscall_table() and paravirt_get_fsys_bubble_doen() to paravirtualize fsyscall implementation. This patch just add the hooks fsyscall and don't paravirtualize it. Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Signed-off-by: Tony Luck <tony.luck@intel.com>	2009-03-26 10:48:33 -07:00
Tony Luck	71b264f85f	Pull miscellaneous into release branch Conflicts: arch/ia64/kernel/mca.c	2008-04-17 10:14:51 -07:00
Tony Luck	14d0647c98	Pull virt-cpu-accounting into release branch	2008-04-17 10:12:27 -07:00
Pavel Emelyanov	96ded9dadd	[IA64] fix getpid and set_tid_address fast system calls for pid namespaces The sys_getpid() and sys_set_tid_address() behavior changed from return current->tgid to struct pid pid; pid = current->pids[PIDTYPE_PID].pid; return pid->numbers[pid->level].nr; But the fast system calls on ia64 still operate the old way. Patch them appropriately to let ia64 work with pid namespaces. Besides, this is one more step in deprecating of pid and tgid on task_struct. The fsys_getppid() is to be patched as well, but its logic is much more complex now, so I will make it later. One thing I'm not 100% sure is the trick with the IA64_UPID_SHIFT. On order to access the pid->level's element of an array I have to perform the following calculations pid + sizeof(struct upid) pid->level The problem is that ia64 can only multiply float point registers, while all the offsets I have in code are in rXX ones. Fortunately, the sizeof(struct upid) is 32 bytes on ia64 (and is very unlikely to ever change), so the calculations get simpler: pid + pid->level << 5 So, I introduce the IA64_UPID_SHIFT and use the shl instruction. I also looked at how gcc compiles the similar place and found that it makes it with shift as well. Is this OK to do so? Tested with ski emulator with 2.6.24 kernel, but fits 2.6.25-rc4 and 2.6.25-rc4-mm1 as well. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: David Mosberger-Tang <davidm@hpl.hp.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Amy Griffis <amy.griffis@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com>	2008-04-09 10:33:36 -07:00
Hidetoshi Seto	4fe01c68eb	[IA64] cleanup and improve fsys_gettimeofday This patch does: - Remove outdated comments (which someday I marked with "?"). - Reassemble instructions to fit them in fewer bundles. - If McKinley Errata 9 workaround is not needed, the workaround bundles will be patched out with NOPs. However it also not needed to have a totally NOP bundle (nop * 3) before branch. As a result, this makes the code path 3 (or 2) bundles shorter (and remove 1 unnecessary stop bit). It seems to be 1% faster. (10sec loop test, with nojitter @ Madison 1.5GHz x 4) Before: CPU 0: 0.14 (usecs) (0 errors / 69598875 iterations) CPU 1: 0.14 (usecs) (0 errors / 69630721 iterations) CPU 2: 0.14 (usecs) (0 errors / 69607850 iterations) CPU 3: 0.14 (usecs) (0 errors / 69619832 iterations) After: CPU 0: 0.14 (usecs) (0 errors / 70257728 iterations) CPU 1: 0.14 (usecs) (0 errors / 70309498 iterations) CPU 2: 0.14 (usecs) (0 errors / 70280639 iterations) CPU 3: 0.14 (usecs) (0 errors / 70260682 iterations) Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2008-03-10 16:35:47 -07:00
Hidetoshi Seto	b64f34cdfe	[IA64] VIRT_CPU_ACCOUNTING (accurate cpu time accounting) This patch implements VIRT_CPU_ACCOUNTING for ia64, which enable us to use more accurate cpu time accounting. The VIRT_CPU_ACCOUNTING is an item of kernel config, which s390 and powerpc arch have. By turning this config on, these archs change the mechanism of cpu time accounting from tick-sampling based one to state-transition based one. The state-transition based accounting is done by checking time (cycle counter in processor) at every state-transition point, such as entrance/exit of kernel, interrupt, softirq etc. The difference between point to point is the actual time consumed during in the state. There is no doubt about that this value is more accurate than that of tick-sampling based accounting. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2008-02-20 12:55:37 -08:00
Tony Luck	0aa366f351	[IA64] Convert to generic timekeeping/clocksource This is a merge of Peter Keilty's initial patch (which was revived by Bob Picco) for this with Hidetoshi Seto's fixes and scaling improvements. Acked-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2007-07-20 11:22:30 -07:00
Hidetoshi Seto	829a999625	[IA64] ar.itc access must really be after xtime_lock.sequence has been read The ".acq" semantics of the load only apply w.r.t. other data access. Reading the clock (ar.itc) isn't a data access so strange things can happen here. Specifically the read of ar.itc can be launched as soon as the read of xtime_lock.sequence is ISSUED. Since this may cache miss, and that might cause a thread switch, and there may be cache contention for the line containing xtime_lock, it may be a long time before the actual value is returned, so the ar.itc value may be very stale. Move the consumption of r28 up before the read of ar.itc to make sure that we really have got the current value of xtime_lock.sequence before look at ar.itc. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2007-07-13 16:21:44 -07:00
Fenghua Yu	3bc207d2b7	[IA64] fsys_getcpu for IA64 On 1.6GHz Montectio Tiger4, the following performance data is measured with kernel built with defconfig which has NUMA configured: Fastest sys_getcpu: 502 itc counts. Fastest fsys_getcpu: 28 itc counts. fsys_getcpu performance is largly impacted by whether data (node_to_cpu_map etc) is in cache. It can take fsys_getcpu up to ~150 itc counts in cold cache case. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2007-03-07 16:27:09 -08:00
Ken Chen	c8c1635faa	[IA64] cleanup in fsys.S beautify coding style for zeroing end of fsyscall_table entries. Remove misleading __NR_syscall_last and add more comments. Drop (now unneeded) "guard against failure to increase NR_syscalls" Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2006-02-28 08:53:32 -08:00
Chen, Kenneth W	9ed2ad8648	[IA64] add syscall entry for at() Wire up the ia64 syscalls for at() functions. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2006-02-06 10:42:46 -08:00
Tony Luck	7ae69d2aa4	[IA64] Add stub entry to fsys.S for sys_migrate_pages When this new syscall was added to ia64 in commit `39743889aa` fsys.S was forgotten. Add a ".data8 0" there to keep it in step. [Reported by Stephane Eranian] Signed-off-by: Tony Luck <tony.luck@intel.com>	2006-01-13 10:03:58 -08:00
Sam Ravnborg	39e01cb874	kbuild: ia64 use generic asm-offsets.h support Delete obsolete stuff from arch Makefile Rename file to asm-offsets.h The trick used in the arch Makefile to circumvent the circular dependency is kept. Signed-off-by: Sam Ravnborg <sam@ravnborg.org>	2005-09-09 22:03:13 +02:00
Tony Luck	f2cbb4f019	Auto merge with /home/aegl/GIT/linus	2005-06-15 14:06:48 -07:00
Christoph Lameter	a2a64769d0	[IA64] Fix race condition in the rt_sigprocmask fastcall current->blocked will be set to the value of current->thread_info->flags if the cmpxchg to update thread_info->flags fails. For performance reasons the store into current->blocked was placed in the cmpxchg loop. However, the cmpxchg overwrites the register holding the value to be stored. In the rare case of a retry the value of thread_info->flags will be written into current->blocked. The fix is to use another register so that the register containing the current->blocked value is not overwritten. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-06-09 13:04:30 -07:00
David Mosberger-Tang	ebcc80c1b6	[IA64] Merge audit fix for fsyscalls with syscall-optimizations Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-05-05 11:30:48 -07:00
Amy Griffis	3ac3ed555b	[PATCH] fix ia64 syscall auditing Attached is a patch against David's audit.17 kernel that adds checks for the TIF_SYSCALL_AUDIT thread flag to the ia64 system call and signal handling code paths.The patch enables auditing of system calls set up via fsys_bubble_down, as well as ensuring that audit_syscall_exit() is called on return from sigreturn. Neglecting to check for TIF_SYSCALL_AUDIT at these points results in incorrect information in audit_context, causing frequent system panics when system call auditing is enabled on an ia64 system. Signed-off-by: Amy Griffis <amy.griffis@hp.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2005-04-29 16:12:55 +01:00
David Mosberger-Tang	fbf7192ba0	[IA64] Annotate fsys_bubble_down() with McKinley dispatch info. This patch changes comments & formatting only. There is no code change. Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-27 21:21:26 -07:00
David Mosberger-Tang	1ba7be7d69	[IA64] Reschedule fsys_bubble_down(). Improvements come from eliminating srlz.i, not scheduling AR/CR-reads too early (while there are others still pending), scheduling the backing-store switch as well as possible, splitting the BBB bundle into a MIB/MBB pair. Why is it safe to eliminate the srlz.i? Observe that we used to clear bits ~PSR_PRESERVED_BITS in PSR.L. Since PSR_PRESERVED_BITS==PSR.{UP,MFL,MFH,PK,DT,PP,SP,RT,IC}, we ended up clearing PSR.{BE,AC,I,DFL,DFH,DI,DB,SI,TB}. However, PSR.BE : already is turned off in __kernel_syscall_via_epc() PSR.AC : don't care (kernel normally turns PSR.AC on) PSR.I : already turned off by the time fsys_bubble_down gets invoked PSR.DFL: always 0 (kernel never turns it on) PSR.DFH: don't care --- kernel never touches f32-f127 on its own initiative PSR.DI : always 0 (kernel never turns it on) PSR.SI : always 0 (kernel never turns it on) PSR.DB : don't care --- kernel never enables kernel-level breakpoints PSR.TB : must be 0 already; if it wasn't zero on entry to __kernel_syscall_via_epc, the branch to fsys_bubble_down will trigger a taken branch; the taken-trap-handler then converts the syscall into a break-based system-call. In other words: all the bits we're clearying are either 0 already or are don't cares! Thus, we don't have to write PSR.L at all and we don't have to do a srlz.i either. Good for another ~20 cycle improvement for EPC-based heavy-weight syscalls. Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-04-27 21:20:51 -07:00
Linus Torvalds	1da177e4c3	Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!	2005-04-16 15:20:36 -07:00

32 Commits