linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-04 02:56:43 +07:00

Author	SHA1	Message	Date
Hiroshi Shimamoto	ec5d498991	sched: fix deadlock in setting scheduler parameter to zero Andrei Gusev wrote: > I played witch scheduler settings. After doing something like: > echo -n 1000000 >sched_rt_period_us > > command is locked. I found in kernel.log: > > Sep 11 00:39:34 zaratustra > Sep 11 00:39:34 zaratustra Pid: 4495, comm: bash Tainted: G W > (2.6.26.3 #12) > Sep 11 00:39:34 zaratustra EIP: 0060:[<c0213fc7>] EFLAGS: 00210246 CPU: 0 > Sep 11 00:39:34 zaratustra EIP is at div64_u64+0x57/0x80 > Sep 11 00:39:34 zaratustra EAX: 0000389f EBX: 00000000 ECX: 00000000 > EDX: 00000000 > Sep 11 00:39:34 zaratustra ESI: d9800000 EDI: d9800000 EBP: 0000389f > ESP: ea7a6edc > Sep 11 00:39:34 zaratustra DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 > Sep 11 00:39:34 zaratustra Process bash (pid: 4495, ti=ea7a6000 > task=ea744000 task.ti=ea7a6000) > Sep 11 00:39:34 zaratustra Stack: 00000000 000003e8 d9800000 0000389f > c0119042 00000000 00000000 00000001 > Sep 11 00:39:34 zaratustra 00000000 00000000 ea7a6f54 00010000 00000000 > c04d2e80 00000001 000e7ef0 > Sep 11 00:39:34 zaratustra c01191a3 00000000 00000000 ea7a6fa0 00000001 > ffffffff c04d2e80 ea5b2480 > Sep 11 00:39:34 zaratustra Call Trace: > Sep 11 00:39:34 zaratustra [<c0119042>] __rt_schedulable+0x52/0x130 > Sep 11 00:39:34 zaratustra [<c01191a3>] sched_rt_handler+0x83/0x120 > Sep 11 00:39:34 zaratustra [<c01a76a6>] proc_sys_call_handler+0xb6/0xd0 > Sep 11 00:39:34 zaratustra [<c01a76c0>] proc_sys_write+0x0/0x20 > Sep 11 00:39:34 zaratustra [<c01a76d9>] proc_sys_write+0x19/0x20 > Sep 11 00:39:34 zaratustra [<c016cc68>] vfs_write+0xa8/0x140 > Sep 11 00:39:34 zaratustra [<c016cdd1>] sys_write+0x41/0x80 > Sep 11 00:39:34 zaratustra [<c0103051>] sysenter_past_esp+0x6a/0x91 > Sep 11 00:39:34 zaratustra ======================= > Sep 11 00:39:34 zaratustra Code: c8 41 0f ad f3 d3 ee f6 c1 20 0f 45 de > 31 f6 0f ad ef d3 ed f6 c1 20 0f 45 fd 0f 45 ee 31 c9 39 eb 89 fe 89 ea > 77 08 89 e8 31 d2 <f7> f3 89 c1 89 f0 8b 7c 24 08 f7 f3 8b 74 24 04 89 > ca 8b 1c 24 > Sep 11 00:39:34 zaratustra EIP: [<c0213fc7>] div64_u64+0x57/0x80 SS:ESP > 0068:ea7a6edc > Sep 11 00:39:34 zaratustra ---[ end trace 4eaa2a86a8e2da22 ]--- fix the boundary condition. sysctl_sched_rt_period=0 makes exception at to_ratio(). Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-11 09:39:18 +02:00
Zhang, Yanmin	baf25731e5	sched: fix 2.6.27-rc5 couldn't boot on tulsa machine randomly On my tulsa x86-64 machine, kernel 2.6.25-rc5 couldn't boot randomly. Basically, function __enable_runtime forgets to reset rt_rq->rt_throttled to 0. When every cpu is up, per-cpu migration_thread is created and it runs very fast, sometimes to mark the corresponding rt_rq->rt_throttled to 1 very quickly. After all cpus are up, with below calling chain: sched_init_smp => arch_init_sched_domains => build_sched_domains => ... => cpu_attach_domain => rq_attach_root => set_rq_online => ... => _enable_runtime _enable_runtime is called against every rt_rq again, so rt_rq->rt_time is reset to 0, but rt_rq->rt_throttled might be still 1. Later on function do_sched_rt_period_timer couldn't reset it, and all RT tasks couldn't be scheduled to run on that cpu. here is RT task migration_thread which is woken up when a task is migrated to another cpu. Below patch fixes it against 2.6.27-rc5. Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-11 09:34:28 +02:00
Ingo Molnar	429b022af4	Merge commit 'v2.6.27-rc6' into core/rcu	2008-09-10 08:35:40 +02:00
Thomas Gleixner	61c22c34c6	clockevents: remove WARN_ON which was used to gather information The issue of the endless reprogramming loop due to a too small min_delta_ns was fixed with the previous updates of the clock events code, but we had no information about the spread of this problem. I added a WARN_ON to get automated information via kerneloops.org and to get some direct reports, which allowed me to analyse the affected machines. The WARN_ON has served its purpose and would be annoying for a release kernel. Remove it and just keep the information about the increase of the min_delta_ns value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-09 22:20:01 +02:00
Linus Torvalds	e1d7bf1499	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: arch_reinit_sched_domains() must destroy domains to force rebuild sched, cpuset: rework sched domains and CPU hotplug handling (v4)	2008-09-08 15:47:21 -07:00
Manfred Spraul	e545a6140b	kernel/cpu.c: create a CPU_STARTING cpu_chain notifier Right now, there is no notifier that is called on a new cpu, before the new cpu begins processing interrupts/softirqs. Various kernel function would need that notification, e.g. kvm works around by calling smp_call_function_single(), rcu polls cpu_online_map. The patch adds a CPU_STARTING notification. It also adds a helper function that sends the message to all cpu_chain handlers. Tested on x86-64. All other archs are untested. Especially on sparc, I'm not sure if I got it right. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-08 19:25:24 +02:00
Linus Torvalds	f532522565	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clocksource, acpi_pm.c: check for monotonicity clocksource, acpi_pm.c: use proper read function also in errata mode ntp: fix calculation of the next jiffie to trigger RTC sync x86: HPET: read back compare register before reading counter x86: HPET fix moronic 32/64bit thinko clockevents: broadcast fixup possible waiters HPET: make minimum reprogramming delta useful clockevents: prevent endless loop lockup clockevents: prevent multiple init/shutdown clockevents: enforce reprogram in oneshot setup clockevents: prevent endless loop in periodic broadcast handler clockevents: prevent clockevent event_handler ending up handler_noop	2008-09-06 19:33:26 -07:00
Ingo Molnar	291c54ff76	Merge branch 'sched/cpuset' into sched/urgent	2008-09-06 21:03:16 +02:00
Max Krasnyansky	dfb512ec48	sched: arch_reinit_sched_domains() must destroy domains to force rebuild What I realized recently is that calling rebuild_sched_domains() in arch_reinit_sched_domains() by itself is not enough when cpusets are enabled. partition_sched_domains() code is trying to avoid unnecessary domain rebuilds and will not actually rebuild anything if new domain masks match the old ones. What this means is that doing echo 1 > /sys/devices/system/cpu/sched_mc_power_savings on a system with cpusets enabled will not take affect untill something changes in the cpuset setup (ie new sets created or deleted). This patch fixes restore correct behaviour where domains must be rebuilt in order to enable MC powersaving flags. Test on quad-core Core2 box with both CONFIG_CPUSETS and !CONFIG_CPUSETS. Also tested on dual-core Core2 laptop. Lockdep is happy and things are working as expected. Signed-off-by: Max Krasnyansky <maxk@qualcomm.com> Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-06 19:22:15 +02:00
Manfred Spraul	3ba35573ad	kernel/cpu.c: Move the CPU_DYING notifiers When a cpu is taken offline, the CPU_DYING notifiers are called on the dying cpu. According to <linux/notifiers.h>, the cpu should be "not running any task, not handling interrupts, soon dead". For the current implementation, this is not true: - __cpu_disable can fail. If it fails, then the cpu will remain alive and happy. - At least on x86, __cpu_disable() briefly enables the local interrupts to handle any outstanding interrupts. What about moving CPU_DYING down a few lines, behind the __cpu_disable() line? There are only two CPU_DYING handlers in the kernel right now: one in kvm, one in the scheduler. Both should work with the patch applied [and: I'm not sure if either one handles a failing __cpu_disable()] The patch survives simple offlining a cpu. kvm untested due to lack of a test setup. Signed-off-By: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-06 19:13:59 +02:00
Gautham R Shenoy	38736f4750	sched: fix __load_balance_iterator() for cfq with only one task The __load_balance_iterator() returns a NULL when there's only one sched_entity which is a task. It is caused by the following code-path. /* Skip over entities that are not tasks */ do { se = list_entry(next, struct sched_entity, group_node); next = next->next; } while (next != &cfs_rq->tasks && !entity_is_task(se)); if (next == &cfs_rq->tasks) return NULL; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This will return NULL even when se is a task. As a side-effect, there was a regression in sched_mc behavior since 2.6.25, since iter_move_one_task() when it calls load_balance_start_fair(), would not get any tasks to move! Fix this by checking if the last entity was a task or not. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-06 16:53:34 +02:00
Ingo Molnar	7f79d852ed	Merge branch 'linus' into sched/devel	2008-09-06 16:51:57 +02:00
Maciej W. Rozycki	4ff4b9e19a	ntp: fix calculation of the next jiffie to trigger RTC sync We have a bug in the calculation of the next jiffie to trigger the RTC synchronisation. The aim here is to run sync_cmos_clock() as close as possible to the middle of a second. Which means we want this function to be called less than or equal to half a jiffie away from when now.tv_nsec equals 5e8 (500000000). If this is not the case for a given call to the function, for this purpose instead of updating the RTC we calculate the offset in nanoseconds to the next point in time where now.tv_nsec will be equal 5e8. The calculated offset is then converted to jiffies as these are the unit used by the timer. Hovewer timespec_to_jiffies() used here uses a ceil()-type rounding mode, where the resulting value is rounded up. As a result the range of now.tv_nsec when the timer will trigger is from 5e8 to 5e8 + TICK_NSEC rather than the desired 5e8 - TICK_NSEC / 2 to 5e8 + TICK_NSEC / 2. As a result if for example sync_cmos_clock() happens to be called at the time when now.tv_nsec is between 5e8 + TICK_NSEC / 2 and 5e8 to 5e8 + TICK_NSEC, it will simply be rescheduled HZ jiffies later, falling in the same range of now.tv_nsec again. Similarly for cases offsetted by an integer multiple of TICK_NSEC. This change addresses the problem by subtracting TICK_NSEC / 2 from the nanosecond offset to the next point in time where now.tv_nsec will be equal 5e8, effectively shifting the following rounding in timespec_to_jiffies() so that it produces a rounded-to-nearest result. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-06 15:31:48 +02:00
Krzysztof Helt	c8bfff6dd4	sched: compilation fix with gcc 3.4.6 I found that 2.6.27-rc5-mm1 does not compile with gcc 3.4.6. The error is: CC kernel/sched.o kernel/sched.c: In function `start_rt_bandwidth': kernel/sched.c:208: sorry, unimplemented: inlining failed in call to 'rt_bandwidth_enabled': function body not available kernel/sched.c:214: sorry, unimplemented: called from here make[1]: * [kernel/sched.o] Error 1 make: * [kernel] Error 2 It seems that the gcc 3.4.6 requires full inline definition before first usage. The patch below fixes the compilation problem. Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> (if needed> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-06 15:17:09 +02:00
Thomas Gleixner	7300711e8c	clockevents: broadcast fixup possible waiters Until the C1E patches arrived there where no users of periodic broadcast before switching to oneshot mode. Now we need to trigger a possible waiter for a periodic broadcast when switching to oneshot mode. Otherwise we can starve them for ever. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-09-06 07:21:17 +02:00
Balbir Singh	49048622ea	sched: fix process time monotonicity Spencer reported a problem where utime and stime were going negative despite the fixes in commit `b27f03d4bd`. The suspected reason for the problem is that signal_struct maintains it's own utime and stime (of exited tasks), these are not updated using the new task_utime() routine, hence sig->utime can go backwards and cause the same problem to occur (sig->utime, adds tsk->utime and not task_utime()). This patch fixes the problem TODO: using max(task->prev_utime, derived utime) works for now, but a more generic solution is to implement cputime_max() and use the cputime_gt() function for comparison. Reported-by: spencer@bluehost.com Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 18:14:35 +02:00
Peter Zijlstra	56c7426b39	sched_clock: fix NOHZ interaction If HLT stops the TSC, we'll fail to account idle time, thereby inflating the actual process times. Fix this by re-calibrating the clock against GTOD when leaving nohz mode. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 18:14:08 +02:00
Thomas Gleixner	1fb9b7d29d	clockevents: prevent endless loop lockup The C1E/HPET bug reports on AMDX2/RS690 systems where tracked down to a too small value of the HPET minumum delta for programming an event. The clockevents code needs to enforce an interrupt event on the clock event device in some cases. The enforcement code was stupid and naive, as it just added the minimum delta to the current time and tried to reprogram the device. When the minimum delta is too small, then this loops forever. Add a sanity check. Allow reprogramming to fail 3 times, then print a warning and double the minimum delta value to make sure, that this does not happen again. Use the same function for both tick-oneshot and tick-broadcast code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 11:11:53 +02:00
Thomas Gleixner	9c17bcda99	clockevents: prevent multiple init/shutdown While chasing the C1E/HPET bugreports I went through the clock events code inch by inch and found that the broadcast device can be initialized and shutdown multiple times. Multiple shutdowns are not critical, but useless waste of time. Multiple initializations are simply broken. Another CPU might have the device in use already after the first initialization and the second init could just render it unusable again. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 11:11:52 +02:00
Thomas Gleixner	7205656ab4	clockevents: enforce reprogram in oneshot setup In tick_oneshot_setup we program the device to the given next_event, but we do not check the return value. We need to make sure that the device is programmed enforced so the interrupt handler engine starts working. Split out the reprogramming function from tick_program_event() and call it with the device, which was handed in to tick_setup_oneshot(). Set the force argument, so the devices is firing an interrupt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 11:11:52 +02:00
Thomas Gleixner	d4496b3955	clockevents: prevent endless loop in periodic broadcast handler The reprogramming of the periodic broadcast handler was broken, when the first programming returned -ETIME. The clockevents code stores the new expiry value in the clock events device next_event field only when the programming time has not been elapsed yet. The loop in question calculates the new expiry value from the next_event value and therefor never increases. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 11:11:51 +02:00
Venkatesh Pallipadi	7c1e768974	clockevents: prevent clockevent event_handler ending up handler_noop There is a ordering related problem with clockevents code, due to which clockevents_register_device() called after tickless/highres switch will not work. The new clockevent ends up with clockevents_handle_noop as event handler, resulting in no timer activity. The problematic path seems to be * old device already has hrtimer_interrupt as the event_handler * new clockevent device registers with a higher rating * tick_check_new_device() is called * clockevents_exchange_device() gets called * old->event_handler is set to clockevents_handle_noop * tick_setup_device() is called for the new device * which sets new->event_handler using the old->event_handler which is noop. Change the ordering so that new device inherits the proper handler. This does not have any issue in normal case as most likely all the clockevent devices are setup before the highres switch. But, can potentially be affecting some corner case where HPET force detect happens after the highres switch. This was a problem with HPET in MSI mode code that we have been experimenting with. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-09-05 11:11:51 +02:00
Al Viro	b380b0d4f7	forgotten refcount on sysctl root table We should've set refcount on the root sysctl table; otherwise we'll blow up the first time we get down to zero dynamically registered sysctl tables. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-09-04 11:06:21 -07:00
John Kacur	9d35935747	pm_qos_requirement might sleep Make PM_QOS and CPU_IDLE play nicer when run with the RT-Preempt kernel. The purpose of the patch is to remove the spin_lock around the read in the function pm_qos_requirement - since spinlocks can sleep in -rt and this function is called from idle. CPU_IDLE polls the target_value's of some of the pm_qos parameters from the idle loop causing sleeping locking warnings. Changing the target_value to an atomic avoids this issue. Remove the spinlock in pm_qos_requirement by making target_value an atomic type. Signed-off-by: mark gross <mgross@linux.intel.com> Signed-off-by: John Kacur <jkacur@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-09-02 19:21:40 -07:00
Oleg Nesterov	950bbabb5a	pid_ns: (BUG 11391) change ->child_reaper when init->group_leader exits We don't change pid_ns->child_reaper when the main thread of the subnamespace init exits. As Robert Rex <robert.rex@exasol.com> pointed out this is wrong. Yes, the re-parenting itself works correctly, but if the reparented task exits it needs ->parent->nsproxy->pid_ns in do_notify_parent(), and if the main thread is zombie its ->nsproxy was already cleared by exit_task_namespaces(). Introduce the new function, find_new_reaper(), which finds the new ->parent for the re-parenting and changes ->child_reaper if needed. Kill the now unneeded exit_child_reaper(). Also move the changing of ->child_reaper from zap_pid_ns_processes() to find_new_reaper(), this consolidates the games with ->child_reaper and makes it stable under tasklist_lock. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=11391 Reported-by: Robert Rex <robert.rex@exasol.com> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-09-02 19:21:38 -07:00
Oleg Nesterov	add0d4dfd6	pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong. Yes, we have already killed all tasks in this namespace, and sys_wait4() doesn't see any child. But this doesn't mean ->children list is empty, we may have EXIT_DEAD tasks which are not visible to do_wait(). In that case the subsequent forget_original_parent() will crash the kernel because it will try to re-parent these tasks to the NULL reaper. Even if there are no childs, it is not good that forget_original_parent() uses reaper == NULL. Change the code to set ->child_reaper = init_pid_ns.child_reaper instead. We could use pid_ns->parent->child_reaper as well, I think this does not really matter. These EXIT_DEAD tasks are not visible to the new ->parent after re-parenting, they will silently do release_task() eventually. Note that we must change ->child_reaper, otherwise forget_original_parent() will use reaper == father, and in that case we will hit the (correct) BUG_ON(!list_empty(&father->children)). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-09-02 19:21:38 -07:00
Linus Torvalds	99039e1352	Merge branch 'audit.b57' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b57' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: [PATCH] audit: Moved variable declaration to beginning of function	2008-09-02 11:04:47 -07:00
Oleg Nesterov	cbaed698f3	softlockup: minor cleanup, don't check task->state twice The recent commit 16d9679f33caf7e683471647d1472bfe133d858 changed check_hung_task() to filter out the TASK_KILLABLE tasks. We can move this check to the caller which has to test t->state anyway. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-09-02 10:49:51 -07:00
Randy Dunlap	6781f4ae30	kernel/resource.c: fix new kernel-doc warning Fix kernel-doc warning for new function: Warning(linux-2.6.27-rc5-git2//kernel/resource.c:448): No description found for parameter 'root' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-09-02 10:47:30 -07:00
Cordelia	c4bacefb7a	[PATCH] audit: Moved variable declaration to beginning of function got rid of compilation warning: ISO C90 forbids mixed declarations and code Signed-off-by: Cordelia Sam <cordesam@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-09-01 23:06:45 -04:00
Linus Torvalds	bef69ea0dc	Resource handling: add 'insert_resource_expand_to_fit()' function Not used anywhere yet, but this complements the existing plain 'insert_resource()' functionality with a version that can expand the resource we are adding in order to fix up any conflicts it has with existing resources. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-29 20:25:20 -07:00
Andi Kleen	316d9679f3	Don't trigger softlockup detector on network fs blocked tasks Pulling the ethernet cable on a 2.6.27-rc system with NFS mounts currently leads to an ongoing flood of soft lockup detector backtraces for all tasks blocked on the NFS mounts when the hickup takes longer than 120s. I don't think NFS problems should be all that noisy. Luckily there's a reasonably easy way to distingush this case. Don't report task softlockup warnings for tasks in TASK_KILLABLE state, which is used by the network file systems. I believe this patch is a 2.6.27 candidate. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-29 14:46:29 -07:00
Linus Torvalds	66833d5f39	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: exit signals: use of uninitialized field notify_count lockdep: fix invalid list_del_rcu in zap_class lockstat: repair erronous contention statistics lockstat: fix numerical output rounding error	2008-08-28 12:31:49 -07:00
Linus Torvalds	0234bf1d98	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: rt-bandwidth accounting fix sched: fix sched_rt_rq_enqueue() resched idle	2008-08-28 12:31:12 -07:00
Linus Torvalds	e52c8857e0	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: update defconfigs x86: msr: fix bogus return values from rdmsr_safe/wrmsr_safe x86: cpuid: correct return value on partial operations x86: msr: correct return value on partial operations x86: cpuid: propagate error from smp_call_function_single() x86: msr: propagate errors from smp_call_function_single() smp: have smp_call_function_single() detect invalid CPUs	2008-08-28 12:30:59 -07:00
Rafael J. Wysocki	41108eb101	ftrace: disable tracing for hibernation In accordance with commit `f42ac38c59` ("ftrace: disable tracing for suspend to ram"), disable tracing around the suspend code in hibernation code paths. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-28 12:27:39 -07:00
Peter Zijlstra	cc2991cf15	sched: rt-bandwidth accounting fix It fixes an accounting bug where we would continue accumulating runtime even though the bandwidth control is disabled. This would lead to very long throttle periods once bandwidth control gets turned on again. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-28 13:42:38 +02:00
Ingo Molnar	7940ca3605	sched: extract walk_tg_tree(), fix fix: kernel/sched.c: In function '__rt_schedulable': kernel/sched.c:8771: error: implicit declaration of function 'walk_tg_tree' kernel/sched.c:8771: error: 'tg_nop' undeclared (first use in this function) kernel/sched.c:8771: error: (Each undeclared identifier is reported only once kernel/sched.c:8771: error: for each function it appears in.) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-28 12:08:00 +02:00
Ingo Molnar	aef745fca0	sched: clean up __might_sleep() add KERN_ to the printout and clean up the flow a bit. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-28 11:36:03 +02:00
Joe Korty	29cbef4869	make might_sleep() display the oopsing process Expand might_sleep's printk to indicate the oopsing process. Signed-off-by: Joe Korty <joe.korty@ccur.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-28 11:36:02 +02:00
Bharata B Rao	aec0a5142c	sched: call resched_task() conditionally from new task wake up path - During wake up of a new task, task_new_fair() can do a resched_task() on the current task. Later in the code path, check_preempt_curr() also ends up doing the same, which can be avoided. Check if TIF_NEED_RESCHED is already set for the current task. - task_new_fair() does a resched_task() on the current task unconditionally. This can be done only in case when child runs before the parent. So this is a small speedup. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-28 11:35:51 +02:00
John Blackwood	f3ade83780	sched: fix sched_rt_rq_enqueue() resched idle When sysctl_sched_rt_runtime is set to something other than -1 and the CONFIG_RT_GROUP_SCHED kernel parameter is NOT enabled, we get into a state where we see one or more CPUs idling forvever even though there are real-time tasks in their rt runqueue that are able to run (no longer throttled). The sequence is: - A real-time task is running when the timer sets the rt runqueue to throttled, and the rt task is resched_task()ed and switched out, and idle is switched in since there are no non-rt tasks to run on that cpu. - Eventually the do_sched_rt_period_timer() runs and un-throttles the rt runqueue, but we just exit the timer interrupt and go back to executing the idle task in the idle loop forever. If we change the sched_rt_rq_enqueue() routine to use some of the code from the CONFIG_RT_GROUP_SCHED enabled version of this same routine and resched_task() the currently executing task (idle in our case) if it is a lower priority task than the higher rt task in the now un-throttled runqueue, the problem is no longer observed. Signed-off-by: John Blackwood <john.blackwood@ccur.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-28 11:13:24 +02:00
Steven Rostedt	f42ac38c59	ftrace: disable tracing for suspend to ram I've been painstakingly debugging the issue with suspend to ram and ftraced. The 2.6.28 code does not have this issue, but since the mcount recording is not going to be in 27, this must be solved for the ftrace daemon version. The resume from suspend to ram would reboot because it was triple faulting. Debugging further, I found that calling the mcount function itself was not an issue, but it would fault when it incremented preempt_count. preempt_count is on the tasks info structure that is on the low memory address of the task's stack. For some reason, it could not write to it. Resuming out of suspend to ram does quite a lot of funny tricks to get to work, so it is not surprising at all that simply doing a preempt_disable() would cause a fault. Thanks to Rafael for suggesting to add a "while (1);" to find the place in resuming that is causing the fault. I would place the loop somewhere in the code, compile and reboot and see if it would either reboot (hit the fault) or simply hang (hit the loop). Doing this over and over again, I narrowed it down that it was happening in enable_nonboot_cpus. At this point, I found that it is easier to simply disable tracing around the suspend code, instead of searching for the particular function that can not handle doing a preempt_disable. This patch disables the tracer as it suspends and reenables it on resume. I tested this patch on my Laptop, and it can resume fine with the patch. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-27 13:54:20 -07:00
Hiroshi Shimamoto	0cd418ddb1	rcuclassic: fix compiler warning CC kernel/rcuclassic.o kernel/rcuclassic.c: In function 'rcu_init_percpu_data': kernel/rcuclassic.c:705: warning: comparison of distinct pointer types lacks a cast kernel/rcuclassic.c:713: warning: comparison of distinct pointer types lacks a cast flags should be unsigned long. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-27 09:28:07 +02:00
Steve VanDeBogart	2633f0e57b	exit signals: use of uninitialized field notify_count task->signal->notify_count is only initialized if task->signal->group_exit_task is not NULL. Reorder a conditional so that uninitialised memory is not used. Found by Valgrind. Signed-off-by: Steve VanDeBogart <vandebo-lkml@nerdbox.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-27 09:10:09 +02:00
Zhu Yi	7487017282	lockdep: fix invalid list_del_rcu in zap_class The problem is found during iwlagn driver testing on v2.6.27-rc4-176-gb8e6c91 kernel, but it turns out to be a lockdep bug. In our testing, we frequently load and unload the iwlagn driver (>50 times). Then the MAX_STACK_TRACE_ENTRIES is reached (expected behaviour?). The error message with the call trace is as below. BUG: MAX_STACK_TRACE_ENTRIES too low! turning off the locking correctness validator. Pid: 4895, comm: iwlagn Not tainted 2.6.27-rc4 #13 Call Trace: [<ffffffff81014aa1>] save_stack_trace+0x22/0x3e [<ffffffff8105390a>] save_trace+0x8b/0x91 [<ffffffff81054e60>] mark_lock+0x1b0/0x8fa [<ffffffff81056f71>] __lock_acquire+0x5b9/0x716 [<ffffffffa00d818a>] ieee80211_sta_work+0x0/0x6ea [mac80211] [<ffffffff81057120>] lock_acquire+0x52/0x6b [<ffffffff81045f0e>] run_workqueue+0x97/0x1ed [<ffffffff81045f5e>] run_workqueue+0xe7/0x1ed [<ffffffff81045f0e>] run_workqueue+0x97/0x1ed [<ffffffff81046ae4>] worker_thread+0xd8/0xe3 [<ffffffff81049503>] autoremove_wake_function+0x0/0x2e [<ffffffff81046a0c>] worker_thread+0x0/0xe3 [<ffffffff810493ec>] kthread+0x47/0x73 [<ffffffff8128e3ab>] trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8100cea9>] child_rip+0xa/0x11 [<ffffffff8100c4df>] restore_args+0x0/0x30 [<ffffffff810316e1>] finish_task_switch+0x0/0xcc [<ffffffff810493a5>] kthread+0x0/0x73 [<ffffffff8100ce9f>] child_rip+0x0/0x11 Although the above is harmless, when the ilwagn module is removed later lockdep will trigger a kernel oops as below. BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff810531e1>] zap_class+0x24/0x82 PGD 73128067 PUD 7448c067 PMD 0 Oops: 0002 [1] SMP CPU 0 Modules linked in: rfcomm l2cap bluetooth autofs4 sunrpc nf_conntrack_ipv6 xt_state nf_conntrack xt_tcpudp ip6t_ipv6header ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq dm_mirror dm_log dm_multipath dm_mod snd_hda_intel sr_mod snd_seq_dummy snd_seq_oss snd_seq_midi_event battery snd_seq snd_seq_device cdrom button snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc e1000e snd_hwdep sg iTCO_wdt iTCO_vendor_support ac pcspkr i2c_i801 i2c_core snd soundcore video output ata_piix ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: mac80211] Pid: 4941, comm: modprobe Not tainted 2.6.27-rc4 #10 RIP: 0010:[<ffffffff810531e1>] [<ffffffff810531e1>] zap_class+0x24/0x82 RSP: 0000:ffff88007bcb3eb0 EFLAGS: 00010046 RAX: 0000000000068ee8 RBX: ffffffff8192a0a0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000001dfb RDI: ffffffff816e70b0 RBP: ffffffffa00cd000 R08: ffffffff816818f8 R09: ffff88007c923558 R10: ffffe20002ad2408 R11: ffffffff811028ec R12: ffffffff8192a0a0 R13: 000000000002bd90 R14: 0000000000000000 R15: 0000000000000296 FS: 00007f9d1cee56f0(0000) GS:ffffffff814a58c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000008 CR3: 0000000073047000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 4941, threadinfo ffff88007bcb2000, task ffff8800758d1fc0) Stack: ffffffff81057376 0000000000000000 ffffffffa00f7b00 0000000000000000 0000000000000080 0000000000618278 00007fff24f16720 0000000000000000 ffffffff8105d37a ffffffffa00f7b00 ffffffff8105d591 313132303863616d Call Trace: [<ffffffff81057376>] ? lockdep_free_key_range+0x61/0xf5 [<ffffffff8105d37a>] ? free_module+0xd4/0xe4 [<ffffffff8105d591>] ? sys_delete_module+0x1de/0x1f9 [<ffffffff8106dbfa>] ? audit_syscall_entry+0x12d/0x160 [<ffffffff8100be2b>] ? system_call_fastpath+0x16/0x1b Code: b2 00 01 00 00 00 c3 31 f6 49 c7 c0 10 8a 61 81 eb 32 49 39 38 75 26 48 98 48 6b c0 38 48 8b 90 08 8a 61 81 48 8b 88 00 8a 61 81 <48> 89 51 08 48 89 0a 48 c7 80 08 8a 61 81 00 02 20 00 48 ff c6 RIP [<ffffffff810531e1>] zap_class+0x24/0x82 RSP <ffff88007bcb3eb0> CR2: 0000000000000008 ---[ end trace a1297e0c4abb0f2e ]--- The root cause for this oops is in add_lock_to_list() when save_trace() fails due to MAX_STACK_TRACE_ENTRIES is reached, entry->class is assigned but entry is never added into any lock list. This makes the list_del_rcu() in zap_class() oops later when the module is unloaded. This patch fixes the problem by assigning entry->class after save_trace() returns success. Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-27 08:40:36 +02:00
Joe Korty	04148b73b8	lockstat: repair erronous contention statistics Fix bad contention counting in /proc/lock_stat. /proc/lockstat tries to gather per-ip contention statistics per-lock. This was failing due to a garbage per-ip index selector being used. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-26 10:37:47 +02:00
Joe Korty	2189459d25	lockstat: fix numerical output rounding error Fix rounding error in /proc/lock_stat numerical output. On occasion the two digit fractional part contains the three digit value '100'. This is due to a bug in the rounding algorithm which pushes values in the range '95..99' to '100' rather than to '00' + an increment to the integer part. For example, - 123456.100 old display + 123457.00 new display Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-26 10:37:46 +02:00
Kevin Diggs	65eb3dc609	sched: add kernel doc for the completion, fix kernel-doc-nano-HOWTO.txt This patch adds kernel doc for the completion feature. An error in the split-man.pl PERL snippet in kernel-doc-nano-HOWTO.txt is also fixed. Signed-off-by: Kevin Diggs <kevdig@hypersurf.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-26 10:26:54 +02:00
Ingo Molnar	3cf430b063	Merge branch 'linus' into sched/devel	2008-08-26 10:25:59 +02:00
H. Peter Anvin	f73be6dedf	smp: have smp_call_function_single() detect invalid CPUs Have smp_call_function_single() return invalid CPU indicies and return -ENXIO. This function is already executed inside a get_cpu()..put_cpu() which locks out CPU removal, so rather than having the higher layers doing another layer of locking to guard against unplugged CPUs do the test here. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-08-25 17:45:48 -07:00
Linus Torvalds	cc556c5c92	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched_clock: fix cpu_clock()	2008-08-25 11:26:02 -07:00
Linus Torvalds	ffb4ba76a2	[module] Don't let gcc inline load_module() 'load_module()' is a complex function that contains all the ELF section logic, and inlining it is utterly insane. But gcc will do it, simply because there is only one call-site. As a result, all the stack space that is allocated for all the work to load the module will still be active when we actually call the module init sequence, and the deep call chain makes stack overflows happen. And stack overflows are really hard to debug, because they not only corrupt random pages below the stack, but also corrupt the thread_info structure that is allocated under the stack. In this case, Alan Brunelle reported some crazy oopses at bootup, after loading the processor module that ends up doing complex ACPI stuff and has quite a deep callchain. This should fix it, and is the sane thing to do regardless. Cc: Alan D. Brunelle <Alan.Brunelle@hp.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-25 11:10:26 -07:00
Peter Zijlstra	354879bb97	sched_clock: fix cpu_clock() This patch fixes 3 issues: a) it removes the dependency on jiffies, because jiffies are incremented by a single CPU, and the tick is not synchronized between CPUs. Therefore relying on it to calculate a window to clip whacky TSC values doesn't work as it can drift around. So instead use [GTOD, GTOD+TICK_NSEC) as the window. b) __update_sched_clock() did (roughly speaking): delta = sched_clock() - scd->tick_raw; clock += delta; Which gives exponential growth, instead of linear. c) allows the sched_clock_cpu() value to warp the u64 without breaking. the results are more reliable sched_clock() deltas: before after sched_clock cpu_clock: 15750 51312 51488 cpu_clock: 59719 51052 50947 cpu_clock: 15879 51249 51061 cpu_clock: 1 50933 51198 cpu_clock: 1 50931 51039 cpu_clock: 1 51093 50981 cpu_clock: 1 51043 51040 cpu_clock: 1 50959 50938 cpu_clock: 1 50981 51011 cpu_clock: 1 51364 51212 cpu_clock: 1 51219 51273 cpu_clock: 1 51389 51048 cpu_clock: 1 51285 51611 cpu_clock: 1 50964 51137 cpu_clock: 1 50973 50968 cpu_clock: 1 50967 50972 cpu_clock: 1 58910 58485 cpu_clock: 1 51082 51025 cpu_clock: 1 50957 50958 cpu_clock: 1 50958 50957 cpu_clock: 1006128 51128 50971 cpu_clock: 1 51107 51155 cpu_clock: 1 51371 51081 cpu_clock: 1 51104 51365 cpu_clock: 1 51363 51309 cpu_clock: 1 51107 51160 cpu_clock: 1 51139 51100 cpu_clock: 1 51216 51136 cpu_clock: 1 51207 51215 cpu_clock: 1 51087 51263 cpu_clock: 1 51249 51177 cpu_clock: 1 51519 51412 cpu_clock: 1 51416 51255 cpu_clock: 1 51591 51594 cpu_clock: 1 50966 51374 cpu_clock: 1 50966 50966 cpu_clock: 1 51291 50948 cpu_clock: 1 50973 50867 cpu_clock: 1 50970 50970 cpu_clock: 998306 50970 50971 cpu_clock: 1 50971 50970 cpu_clock: 1 50970 50970 cpu_clock: 1 50971 50971 cpu_clock: 1 50970 50970 cpu_clock: 1 51351 50970 cpu_clock: 1 50970 51352 cpu_clock: 1 50971 50970 cpu_clock: 1 50970 50970 cpu_clock: 1 51321 50971 cpu_clock: 1 50974 51324 Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-25 17:39:57 +02:00
Adrian Bunk	7a8fc9b248	removed unused #include <linux/version.h>'s This patch lets the files using linux/version.h match the files that #include it. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-23 12:14:12 -07:00
Linus Torvalds	43cc071db8	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: enable LB_BIAS by default	2008-08-22 08:36:55 -07:00
Linus Torvalds	05f57f50e0	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: fix synchronize_rcu() so that kernel-doc works	2008-08-22 08:36:42 -07:00
Oleg Nesterov	93dcf55f82	wait_task_inactive: "improve" the returned value for ->nvcsw == 0 wait_task_inactive() returns 1 when p->nvcsw == 0 \|\| p->nvcsw == 1. This means that two subsequent calls can return the same number while the task was scheduled in between. Change the code to return "nvcsw \| LONG_MIN" instead of "nvcsw ?: 1", now the overlap always needs LONG_MAX schedules. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-22 15:17:31 +02:00
Oleg Nesterov	f31e11d87a	wait_task_inactive(): don't consider task->nivcsw If wait_task_inactive() returns success the task was deactivated. In that case schedule() always increments ->nvcsw which alone can be used as a "generation counter". If the next call returns the same number, we can be sure that the task was unscheduled. Otherwise, because we know that .on_rq == 0 again, ->nvcsw should have been changed in between. Q: perhaps it is better to do "ncsw = (p->nvcsw << 1) \| 1" ? This decreases the possibility of "was it unscheduled" false positive when ->nvcsw == 0. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-22 15:17:29 +02:00
Oleg Nesterov	94d3d8247d	sched: do_wait_for_common: use signal_pending_state() Change do_wait_for_common() to use signal_pending_state() instead of open coding. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-22 15:17:28 +02:00
Paul E. McKenney	275a89bdd3	rcu: use irq-safe locks Some earlier tip/core/rcu patches caused RCU to incorrectly enable irqs too early in boot. This caused Yinghai's repeated-kexec testing to hit oopses, presumably due to so that device interrupts left over from the prior kernel instance (which would oops the newly booting kernel before it got a chance to reset said devices). This patch therefore converts all the local_irq_disable()s in rcuclassic.c to local_irq_save(). Besides, I never did like local_irq_disable() anyway. ;-) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-21 16:01:02 +02:00
Miao Xie	3c4fbe5e01	nohz: fix wrong event handler after online an offlined cpu On the tickless system(CONFIG_NO_HZ=y and CONFIG_HIGH_RES_TIMERS=n), after I made an offlined cpu online, I found this cpu's event handler was tick_handle_periodic, not tick_nohz_handler. After debuging, I found this bug was caused by the wrong tick mode. the tick mode is not changed to NOHZ_MODE_INACTIVE when the cpu is offline. This patch fixes this bug. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-21 09:54:06 +02:00
Randy Dunlap	01dcb0443e	rcu: fix synchronize_rcu() so that kernel-doc works Fix RCU's synchronize_rcu() so that it looks like a C function, enabling it to be recognized as a function with kernel-doc annotation. Warning(linux-2.6.26-git11//kernel/rcupdate.c:81): No description found for parameter 'synchronize_rcu' Warning(linux-2.6.26-git11//kernel/rcupdate.c:81): No description found for parameter 'call_rcu' [akpm@linux-foundation.org: fix comment] Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-21 09:31:44 +02:00
Peter Zijlstra	efc2dead2c	sched: enable LB_BIAS by default Yanmin reported a significant regression on his 16-core machine due to: commit `93b75217df` Author: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Fri Jun 27 13:41:33 2008 +0200 Flip back to the old behaviour. Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-21 08:18:02 +02:00
Ken Chen	2d70b68d42	fix setpriority(PRIO_PGRP) thread iterator breakage When user calls sys_setpriority(PRIO_PGRP ...) on a NPTL style multi-LWP process, only the task leader of the process is affected, all other sibling LWP threads didn't receive the setting. The problem was that the iterator used in sys_setpriority() only iteartes over one task for each process, ignoring all other sibling thread. Introduce a new macro do_each_pid_thread / while_each_pid_thread to walk each thread of a process. Convert 4 call sites in {set/get}priority and ioprio_{set/get}. Signed-off-by: Ken Chen <kenchen@google.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-20 15:40:32 -07:00
Roland McGrath	1b04624f93	tracehook: fix SA_NOCLDWAIT I outwitted myself again in commit `2b2a1ff64a`, and broke the SA_NOCLDWAIT behavior so it leaks zombies. This fixes it. Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Roland McGrath <roland@redhat.com>	2008-08-19 20:37:07 -07:00
Peter Zijlstra	9a7e0b180d	sched: rt-bandwidth fixes The last patch allows sysctl_sched_rt_runtime to disable bandwidth accounting for the group scheduler - however it doesn't deal with sched_setscheduler(), which will keep tasks out of groups that have no assigned runtime. If we relax this, we get into the situation where RT tasks can get into a group when we disable bandwidth control, and then starve them by enabling it again. Rework the schedulability code to check for this condition and fail to turn on bandwidth control with -EBUSY when this situation is found. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-19 13:10:12 +02:00
Peter Zijlstra	eb755805f2	sched: extract walk_tg_tree() Extract walk_tg_tree() and make it a little more generic so we can use it in the schedulablity test. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-19 13:10:11 +02:00
Peter Zijlstra	0b148fa048	sched: rt-bandwidth group disable fixes More extensive disable of bandwidth control. It allows sysctl_sched_rt_runtime to disable full group bandwidth control. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-19 13:10:10 +02:00
Peter Zijlstra	6f0d5c390e	sched: rt-bandwidth accounting fix It fixes an accounting bug where we would continue accumulating runtime even though the bandwidth control is disabled. This would lead to very long throttle periods once bandwidth control gets turned on again. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-19 13:10:09 +02:00
Peter Zijlstra	af4491e516	sched: rt-bandwidth for user grouping interface rt_runtime is a signed value Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-19 13:10:09 +02:00
Hiroshi Shimamoto	0c925d7923	rcuclassic: fix compilation NG fix: CC kernel/rcuclassic.o kernel/rcuclassic.c: In function '__rcu_process_callbacks': kernel/rcuclassic.c:561: error: 'flags' undeclared (first use in this function) kernel/rcuclassic.c:561: error: (Each undeclared identifier is reported only once kernel/rcuclassic.c:561: error: for each function it appears in.) Declare missing variable flags. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-19 11:03:33 +02:00
Paul E. McKenney	eff9b713ee	rcu: fix locking cleanup fallout Given that the rcp->lock is now acquired from call_rcu(), which can be invoked from irq-disable regions, all acquisitions need to disable irqs. The following patch fixes this. Although I don't have any reason to believe that this is the cause of Yinghai's oops, it does need to be fixed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-19 04:15:36 +02:00
Paul E. McKenney	ded00a56e9	rcu: remove redundant ACCESS_ONCE definition from rcupreempt.c Remove the redundant definition of ACCESS_ONCE() from rcupreempt.c in favor of the one in compiler.h. Also merge the comment header from rcupreempt.c's definition into that in compiler.h. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-18 09:45:22 +02:00
Dmitry Baryshkov	6951b12a0f	lockdep: fix spurious 'inconsistent lock state' warning Since `f82b217e35` lockdep can output spurious warnings related to hwirqs due to hardirq_off shrinkage from int to bit-sized flag. Guard it with double negation to fix the warning. Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-18 09:42:31 +02:00
Paul E. McKenney	cd95851785	rcu: fix classic RCU locking cleanup lockdep problem On Fri, Aug 15, 2008 at 04:24:30PM +0200, Ingo Molnar wrote: > > Paul, > > one of your two recent RCU patches caused this lockdep splat in -tip > testing: > > -------------------> > Brought up 2 CPUs > Total of 2 processors activated (6850.87 BogoMIPS). > PM: Adding info for No Bus:platform > khelper used greatest stack depth: 3124 bytes left > > ================================= > [ INFO: inconsistent lock state ] > 2.6.27-rc3-tip #1 > --------------------------------- > inconsistent {softirq-on-W} -> {in-softirq-W} usage. > ksoftirqd/0/4 [HC0[0]:SC1[1]:HE1:SE0] takes: > (&rcu_ctrlblk.lock){-+..}, at: [<c016d91c>] __rcu_process_callbacks+0x1ac/0x1f0 > {softirq-on-W} state was registered at: > [<c01528e4>] __lock_acquire+0x3f4/0x5b0 > [<c0152b29>] lock_acquire+0x89/0xc0 > [<c076142b>] _spin_lock+0x3b/0x70 > [<c016d649>] rcu_init_percpu_data+0x29/0x80 > [<c075e43f>] rcu_cpu_notify+0xaf/0xd0 > [<c076458d>] notifier_call_chain+0x2d/0x60 > [<c0145ede>] __raw_notifier_call_chain+0x1e/0x30 > [<c075db29>] _cpu_up+0x79/0x110 > [<c075dc0d>] cpu_up+0x4d/0x70 > [<c0a769e1>] kernel_init+0xb1/0x200 > [<c01048a3>] kernel_thread_helper+0x7/0x10 > [<ffffffff>] 0xffffffff > irq event stamp: 14 > hardirqs last enabled at (14): [<c01534db>] trace_hardirqs_on+0xb/0x10 > hardirqs last disabled at (13): [<c014dbeb>] trace_hardirqs_off+0xb/0x10 > softirqs last enabled at (0): [<c012b186>] copy_process+0x276/0x1190 > softirqs last disabled at (11): [<c0105c0a>] call_on_stack+0x1a/0x30 > > other info that might help us debug this: > no locks held by ksoftirqd/0/4. > > stack backtrace: > Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.27-rc3-tip #1 > [<c01504dc>] print_usage_bug+0x16c/0x1b0 > [<c0152455>] mark_lock+0xa75/0xb10 > [<c0108b75>] ? sched_clock+0x15/0x30 > [<c015289d>] __lock_acquire+0x3ad/0x5b0 > [<c0152b29>] lock_acquire+0x89/0xc0 > [<c016d91c>] ? __rcu_process_callbacks+0x1ac/0x1f0 > [<c076142b>] _spin_lock+0x3b/0x70 > [<c016d91c>] ? __rcu_process_callbacks+0x1ac/0x1f0 > [<c016d91c>] __rcu_process_callbacks+0x1ac/0x1f0 > [<c016d986>] rcu_process_callbacks+0x26/0x50 > [<c0132305>] __do_softirq+0x95/0x120 > [<c0132270>] ? __do_softirq+0x0/0x120 > [<c0105c0a>] call_on_stack+0x1a/0x30 > [<c0132426>] ? ksoftirqd+0x96/0x110 > [<c0132390>] ? ksoftirqd+0x0/0x110 > [<c01411f7>] ? kthread+0x47/0x80 > [<c01411b0>] ? kthread+0x0/0x80 > [<c01048a3>] ? kernel_thread_helper+0x7/0x10 > ======================= > calling init_cpufreq_transition_notifier_list+0x0/0x20 > initcall init_cpufreq_transition_notifier_list+0x0/0x20 returned 0 after 0 msecs > calling net_ns_init+0x0/0x190 > net_namespace: 676 bytes > initcall net_ns_init+0x0/0x190 returned 0 after 0 msecs > calling cpufreq_tsc+0x0/0x20 > initcall cpufreq_tsc+0x0/0x20 returned 0 after 0 msecs > calling reboot_init+0x0/0x20 > initcall reboot_init+0x0/0x20 returned 0 after 0 msecs > calling print_banner+0x0/0x10 > Booting paravirtualized kernel on bare hardware > > <----------------------- > > my guess is on: > > commit `1f7b94cd3d` > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Date: Tue Aug 5 09:21:44 2008 -0700 > > rcu: classic RCU locking and memory-barrier cleanups > > Ingo Fixes a problem detected by lockdep in which rcu->lock was acquired both in irq context and in process context, but without disabling from process context. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-17 17:38:01 +02:00
Linus Torvalds	406703f8de	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: lockdep: fix build if CONFIG_PROVE_LOCKING not defined lockdep: use WARN() in kernel/lockdep.c lockdep: spin_lock_nest_lock(), checkpatch fixes lockdep: build fix	2008-08-16 17:16:07 -07:00
Linus Torvalds	c100548d46	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: scale sysctl_sched_shares_ratelimit with nr_cpus sched: fix rt-bandwidth hotplug race sched: fix the race between walk_tg_tree and sched_create_group	2008-08-16 17:15:32 -07:00
Linus Torvalds	71ef2a46fc	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: security: Fix setting of PF_SUPERPRIV by __capable()	2008-08-15 15:32:13 -07:00
Stephen Hemminger	df60a84418	lockdep: fix build if CONFIG_PROVE_LOCKING not defined If CONFIG_PROVE_LOCKING not defined, then no dependency information is available. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-15 19:22:04 +02:00
Peter Zijlstra	55cd53404c	sched: scale sysctl_sched_shares_ratelimit with nr_cpus David reported that his Niagra spend a little too much time in tg_shares_up(), which considering he has a large cpu count makes sense. So scale the ratelimit value with the number of cpus like we do for other controls as well. Reported-by: David Miller <davem@davemloft.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-15 18:25:07 +02:00
Steven Rostedt	5802294f1b	rcu: trace fix possible mem-leak In the initialization of the RCU trace module, if rcupreempt_debugfs_init() fails, we never free the the trace buffer. This patch frees the trace buffer in case the debugfs fails. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-15 17:54:40 +02:00
Dave Chinner	be4de35263	completions: uninline try_wait_for_completion and completion_done m68k fails to build with these functions inlined in completion.h. Move them out of line into sched.c and export them to avoid this problem. Signed-off-by: Dave Chinner <david@fromorbit.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-15 08:35:44 -07:00
Andrew Morton	8c5a1cf0ad	kexec: use a mutex for locking rather than xchg() Functionally the same, but more conventional. Cc: Huang Ying <ying.huang@intel.com> Tested-by: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-15 08:35:43 -07:00
Huang Ying	3122c33119	kexec jump: fix for ftrace Ftrace depends on some processor state that we destroyed during kexec and restored by restore_processor_state(). So save_processor_state() and restore_processor_state() are moved into machine_kexec() and ftrace is restored after restore_processor_state(). Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-15 08:35:43 -07:00
Huang Ying	73bd9c72a2	kexec jump: in sync with hibernation implementation Add device_pm_lock() and device_pm_unlock() in kernel_kexec() in sync with current hibernation implementation. Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-15 08:35:42 -07:00
Huang Ying	ca195b7f6d	kexec jump: remove duplication of kexec_restart_prepare() Call kernel_restart_prepare() in kernel_kexec() instead of duplicating the code. Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Pavel Machek <pavel@suse.cz> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-15 08:35:42 -07:00
Huang Ying	163f6876f5	kexec jump: rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control page is used for not only code on some platform. For example in kexec jump, it is used for data and stack too. [akpm@linux-foundation.org: unbreak powerpc and arm, finish conversion] Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-15 08:35:42 -07:00
Huang Ying	7ade3fcc1f	kexec jump: clean up #ifdef and comments Move if (kexec_image->preserve_context) { ... } into #ifdef CONFIG_KEXEC_JUMP to make code looks cleaner. Fix no longer correct comments of kernel_kexec(). Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-15 08:35:42 -07:00
Huang Ying	4cd69b986e	kexec: fix compilation warning on xchg(&kexec_lock, 0) in kernel_kexec() kernel/kexec.c: In function 'kernel_kexec': kernel/kexec.c:1506: warning: value computed is not used Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-15 08:35:42 -07:00
Paul E. McKenney	1f7b94cd3d	rcu: classic RCU locking and memory-barrier cleanups This patch simplifies the locking and memory-barrier usage in the Classic RCU grace-period-detection mechanism, incorporating Lai Jiangshan's feedback from the earlier version (http://lkml.org/lkml/2008/8/1/400 and http://lkml.org/lkml/2008/8/3/43). Passed 10 hours of rcutorture concurrent with CPUs being put online and taken offline on a 128-hardware-thread Power machine. My apologies to whoever in the Eastern Hemisphere was planning to use this machine over the Western Hemisphere night, but it was sitting idle and... So this is ready for tip/core/rcu. This patch is in preparation for moving to a hierarchical algorithm to allow the very large SMP machines -- requested by some people at OLS, and there seem to have been a few recent patches in the 4096-CPU direction as well. The general idea is to move to a much more conservative concurrency design, then apply a hierarchy to reduce contention on the global lock by a few orders of magnitude (larger machines would see greater reductions). The reason for taking a conservative approach is that this code isn't on any fast path. Prototype in progress. This patch is against the linux-tip git tree (tip/core/rcu). If you wish to test this against 2.6.26, use the following set of patches: http://www.rdrop.com/users/paulmck/patches/2.6.26-ljsimp-1.patch http://www.rdrop.com/users/paulmck/patches/2.6.26-ljsimpfix-3.patch The first patch combines commits `5127bed588` and `3cac97cbb1` from Lai Jiangshan <laijs@cn.fujitsu.com>, and the second patch contains my changes. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-15 16:08:47 +02:00
Paul E. McKenney	293a17ebc9	rcu: prevent console flood when one CPU sees another AWOL via RCU One small change needed to keep from flooding the console when one CPU notices that another is AWOL. Unless I am missing something subtle. Otherwise the cleanups look good! Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-15 15:08:58 +02:00
Peter Zijlstra	f1679d0848	sched: fix rt-bandwidth hotplug race When we hot-unplug a cpu and rebuild the sched-domain, all cpus will be detatched. Alex observed the case where a runqueue was stealing bandwidth from an already disabled runqueue to satisfy its own needs. Stop this by skipping over already disabled runqueues. Reported-by: Alex Nixon <alex.nixon@citrix.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Alex Nixon <alex.nixon@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-14 15:50:58 +02:00
David Howells	5cd9c58fbe	security: Fix setting of PF_SUPERPRIV by __capable() Fix the setting of PF_SUPERPRIV by __capable() as it could corrupt the flags the target process if that is not the current process and it is trying to change its own flags in a different way at the same time. __capable() is using neither atomic ops nor locking to protect t->flags. This patch removes __capable() and introduces has_capability() that doesn't set PF_SUPERPRIV on the process being queried. This patch further splits security_ptrace() in two: (1) security_ptrace_may_access(). This passes judgement on whether one process may access another only (PTRACE_MODE_ATTACH for ptrace() and PTRACE_MODE_READ for /proc), and takes a pointer to the child process. current is the parent. (2) security_ptrace_traceme(). This passes judgement on PTRACE_TRACEME only, and takes only a pointer to the parent process. current is the child. In Smack and commoncap, this uses has_capability() to determine whether the parent will be permitted to use PTRACE_ATTACH if normal checks fail. This does not set PF_SUPERPRIV. Two of the instances of __capable() actually only act on current, and so have been changed to calls to capable(). Of the places that were using __capable(): (1) The OOM killer calls __capable() thrice when weighing the killability of a process. All of these now use has_capability(). (2) cap_ptrace() and smack_ptrace() were using __capable() to check to see whether the parent was allowed to trace any process. As mentioned above, these have been split. For PTRACE_ATTACH and /proc, capable() is now used, and for PTRACE_TRACEME, has_capability() is used. (3) cap_safe_nice() only ever saw current, so now uses capable(). (4) smack_setprocattr() rejected accesses to tasks other than current just after calling __capable(), so the order of these two tests have been switched and capable() is used instead. (5) In smack_file_send_sigiotask(), we need to allow privileged processes to receive SIGIO on files they're manipulating. (6) In smack_task_wait(), we let a process wait for a privileged process, whether or not the process doing the waiting is privileged. I've tested this with the LTP SELinux and syscalls testscripts. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>	2008-08-14 22:59:43 +10:00
Max Krasnyansky	cf417141cb	sched, cpuset: rework sched domains and CPU hotplug handling (v4) This is an updated version of my previous cpuset patch on top of the latest mainline git. The patch fixes CPU hotplug handling issues in the current cpusets code. Namely circular locking in rebuild_sched_domains() and unsafe access to the cpu_online_map in the cpuset cpu hotplug handler. This version includes changes suggested by Paul Jackson (naming, comments, style, etc). I also got rid of the separate workqueue thread because it is now safe to call get_online_cpus() from workqueue callbacks. Here are some more details: rebuild_sched_domains() is the only way to rebuild sched domains correctly based on the current cpuset settings. What this means is that we need to be able to call it from different contexts, like cpu hotplug for example. Also latest scheduler code in -tip now calls rebuild_sched_domains() directly from functions like arch_reinit_sched_domains(). In order to support that properly we need to rework cpuset locking rules to avoid circular dependencies, which is what this patch does. New lock nesting rules are explained in the comments. We can now safely call rebuild_sched_domains() from virtually any context. The only requirement is that it needs to be called under get_online_cpus(). This allows cpu hotplug handlers and the scheduler to call rebuild_sched_domains() directly. The rest of the cpuset code now offloads sched domains rebuilds to a workqueue (async_rebuild_sched_domains()). This version of the patch addresses comments from the previous review. I fixed all miss-formated comments and trailing spaces. I also factored out the code that builds domain masks and split up CPU and memory hotplug handling. This was needed to simplify locking, to avoid unsafe access to the cpu_online_map from mem hotplug handler, and in general to make things cleaner. The patch passes moderate testing (building kernel with -j 16, creating & removing domains and bringing cpus off/online at the same time) on the quad-core2 based machine. It passes lockdep checks, even with preemptable RCU enabled. This time I also tested in with suspend/resume path and everything is working as expected. Signed-off-by: Max Krasnyansky <maxk@qualcomm.com> Acked-by: Paul Jackson <pj@sgi.com> Cc: menage@google.com Cc: a.p.zijlstra@chello.nl Cc: vegard.nossum@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-14 11:23:51 +02:00
Zhang, Yanmin	09f2724a78	sched: fix the race between walk_tg_tree and sched_create_group With 2.6.27-rc3, I hit a kernel panic when running volanoMark on my new x86_64 machine. I also hit it with other 2.6.27-rc kernels. See below log. Basically, function walk_tg_tree and sched_create_group have a race between accessing and initiating tg->children. Below patch fixes it by moving tg->children initiation to the front of linking tg->siblings to parent->children. {----------------panic log------------} BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 IP: [<ffffffff802292ab>] walk_tg_tree+0x45/0x7f PGD 1be1c4067 PUD 1bdd8d067 PMD 0 Oops: 0000 [1] SMP CPU 11 Modules linked in: igb Pid: 22979, comm: java Not tainted 2.6.27-rc3 #1 RIP: 0010:[<ffffffff802292ab>] [<ffffffff802292ab>] walk_tg_tree+0x45/0x7f RSP: 0018:ffff8801bfbbbd18 EFLAGS: 00010083 RAX: 0000000000000000 RBX: ffff8800be0dce40 RCX: ffffffffffffffc0 RDX: ffff880102c43740 RSI: 0000000000000000 RDI: ffff8800be0dce40 RBP: ffff8801bfbbbd48 R08: ffff8800ba437bc8 R09: 0000000000001f40 R10: ffff8801be812100 R11: ffffffff805fdf44 R12: ffff880102c43740 R13: 0000000000000000 R14: ffffffff8022cf0f R15: ffffffff8022749f FS: 00000000568ac950(0063) GS:ffff8801bfa26d00(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000001bd848000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process java (pid: 22979, threadinfo ffff8801b145a000, task ffff8801bf18e450) Stack: 0000000000000001 ffff8800ba5c8d60 0000000000000001 0000000000000001 ffff8800bad1ccb8 0000000000000000 ffff8801bfbbbd98 ffffffff8022ed37 0000000000000001 0000000000000286 ffff8801bd5ee180 ffff8800ba437bc8 Call Trace: <IRQ> [<ffffffff8022ed37>] try_to_wake_up+0x71/0x24c [<ffffffff80247177>] autoremove_wake_function+0x9/0x2e [<ffffffff80228039>] ? __wake_up_common+0x46/0x76 [<ffffffff802296d5>] __wake_up+0x38/0x4f [<ffffffff806169cc>] tcp_v4_rcv+0x380/0x62e Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-14 10:58:48 +02:00
Arjan van de Ven	2df8b1d656	lockdep: use WARN() in kernel/lockdep.c Use WARN() instead of a printk+WARN_ON() pair; this way the message becomes part of the warning section for better reporting/collection. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2008-08-13 19:06:46 +02:00
Andrew Morton	c72f4573a5	lockdep: spin_lock_nest_lock(), checkpatch fixes fix: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable #46: FILE: kernel/spinlock.c:326: +EXPORT_SYMBOL(_spin_lock_nest_lock); total: 0 errors, 1 warnings, 26 lines checked Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-13 13:56:51 +02:00
Ingo Molnar	73909f7a66	Merge commit 'v2.6.27-rc3' into core/urgent	2008-08-13 13:56:44 +02:00
Ingo Molnar	d6672c5018	lockdep: build fix fix: kernel/built-in.o: In function `lockdep_stats_show': lockdep_proc.c:(.text+0x3cb2f): undefined reference to `lockdep_count_forward_deps' kernel/built-in.o: In function `l_show': lockdep_proc.c:(.text+0x3d02b): undefined reference to `lockdep_count_forward_deps' lockdep_proc.c:(.text+0x3d047): undefined reference to `lockdep_count_backward_deps' Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-08-13 12:55:10 +02:00

1 2 3 4 5 ...

4776 Commits