linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-04 07:16:45 +07:00

Author	SHA1	Message	Date
Peter Zijlstra	4a55b45036	sched: improve prev_sum_exec_runtime setting Second preparatory patch for fix-ideal runtime: Mark prev_sum_exec_runtime at the beginning of our run, the same spot that adds our wait period to wait_runtime. This seems a more natural location to do this, and it also reduces the code a bit: text data bss dec hex filename 13397 228 1204 14829 39ed sched.o.before 13391 228 1204 14823 39e7 sched.o.after Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-09-05 14:32:49 +02:00
Peter Zijlstra	7c92e54f6f	sched: simplify __check_preempt_curr_fair() Preparatory patch for fix-ideal-runtime: simplify __check_preempt_curr_fair(): get rid of the integer return. text data bss dec hex filename 13404 228 1204 14836 39f4 sched.o.before 13393 228 1204 14825 39e9 sched.o.after functionality is unchanged. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-09-05 14:32:49 +02:00
Ingo Molnar	cf2ab4696e	sched: fix xtensa build warning rename RSR to SRR - 'RSR' is already defined on xtensa. found by Adrian Bunk. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-09-05 14:32:49 +02:00
Ingo Molnar	2491b2b89d	sched: debug: fix sum_exec_runtime clearing when cleaning sched-stats also clear prev_sum_exec_runtime. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-09-05 14:32:49 +02:00
Ingo Molnar	a206c07213	sched: debug: fix cfs_rq->wait_runtime accounting the cfs_rq->wait_runtime debug/statistics counter was not maintained properly - fix this. this also removes some code: text data bss dec hex filename 13420 228 1204 14852 3a04 sched.o.before 13404 228 1204 14836 39f4 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2007-09-05 14:32:49 +02:00
Ingo Molnar	a0dc72601d	sched: fix niced_granularity() shift fix niced_granularity(). This resulted in under-scheduling for CPU-bound negative nice level tasks (and this in turn caused higher than necessary latencies in nice-0 tasks). Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-09-05 14:32:49 +02:00
Suresh Siddha	7fd0d2dde9	sched: fix MC/HT scheduler optimization, without breaking the FUZZ logic. First fix the check if (imbalance + SCHED_LOAD_SCALE_FUZZ < busiest_load_per_task) with this if (imbalance < busiest_load_per_task) As the current check is always false for nice 0 tasks (as SCHED_LOAD_SCALE_FUZZ is same as busiest_load_per_task for nice 0 tasks). With the above change, imbalance was getting reset to 0 in the corner case condition, making the FUZZ logic fail. Fix it by not corrupting the imbalance and change the imbalance, only when it finds that the HT/MC optimization is needed. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-09-05 14:32:48 +02:00
Linus Torvalds	5e7a39275b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: clean up task_new_fair() sched: small schedstat fix sched: fix wait_start_fair condition in update_stats_wait_end() sched: call update_curr() in task_tick_fair() sched: make the scheduler converge to the ideal latency sched: fix sleeper bonus limit	2007-08-31 10:52:00 -07:00
Oleg Nesterov	60187d2708	sigqueue_free: fix the race with collect_signal() Spotted by taoyue <yue.tao@windriver.com> and Jeremy Katz <jeremy.katz@windriver.com>. collect_signal: sigqueue_free: list_del_init(&first->list); if (!list_empty(&q->list)) { // not taken } q->flags &= ~SIGQUEUE_PREALLOC; __sigqueue_free(first); __sigqueue_free(q); Now, __sigqueue_free() is called twice on the same "struct sigqueue" with the obviously bad implications. In particular, this double free breaks the array_cache->avail logic, so the same sigqueue could be "allocated" twice, and the bug can manifest itself via the "impossible" BUG_ON(!SIGQUEUE_PREALLOC) in sigqueue_free/send_sigqueue. Hopefully this can explain these mysterious bug-reports, see http://marc.info/?t=118766926500003 http://marc.info/?t=118466273000005 Alexey Dobriyan reports this patch makes the difference for the testcase, but nobody has an access to the application which opened the problems originally. Also, this patch removes tasklist lock/unlock, ->siglock is enough. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: taoyue <yue.tao@windriver.com> Cc: Jeremy Katz <jeremy.katz@windriver.com> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Roland McGrath <roland@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-31 01:42:23 -07:00
Alexey Dobriyan	99db67bc04	userns: don't leak root user Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Cedric Le Goater <clg@fr.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-31 01:42:23 -07:00
Jarek Poplawski	59845b1ffd	request_irq: fix DEBUG_SHIRQ handling Mariusz Kozlowski reported lockdep's warning: > ================================= > [ INFO: inconsistent lock state ] > 2.6.23-rc2-mm1 #7 > --------------------------------- > inconsistent {in-hardirq-W} -> {hardirq-on-W} usage. > ifconfig/5492 [HC0[0]:SC0[0]:HE1:SE1] takes: > (&tp->lock){+...}, at: [<de8706e0>] rtl8139_interrupt+0x27/0x46b [8139too] > {in-hardirq-W} state was registered at: > [<c0138eeb>] __lock_acquire+0x949/0x11ac > [<c01397e7>] lock_acquire+0x99/0xb2 > [<c0452ff3>] _spin_lock+0x35/0x42 > [<de8706e0>] rtl8139_interrupt+0x27/0x46b [8139too] > [<c0147a5d>] handle_IRQ_event+0x28/0x59 > [<c01493ca>] handle_level_irq+0xad/0x10b > [<c0105a13>] do_IRQ+0x93/0xd0 > [<c010441e>] common_interrupt+0x2e/0x34 ... > other info that might help us debug this: > 1 lock held by ifconfig/5492: > #0: (rtnl_mutex){--..}, at: [<c0451778>] mutex_lock+0x1c/0x1f > > stack backtrace: ... > [<c0452ff3>] _spin_lock+0x35/0x42 > [<de8706e0>] rtl8139_interrupt+0x27/0x46b [8139too] > [<c01480fd>] free_irq+0x11b/0x146 > [<de871d59>] rtl8139_close+0x8a/0x14a [8139too] > [<c03bde63>] dev_close+0x57/0x74 ... This shows that a driver's irq handler was running both in hard interrupt and process contexts with irqs enabled. The latter was done during free_irq() call and was possible only with CONFIG_DEBUG_SHIRQ enabled. This was fixed by another patch. But similar problem is possible with request_irq(): any locks taken from irq handler could be vulnerable - especially with soft interrupts. This patch fixes it by disabling local interrupts during handler's run. (It seems, disabling softirqs should be enough, but it needs more checking on possible races or other special cases). Reported-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-31 01:42:23 -07:00
Rafael J. Wysocki	f3de4be9d5	PM: Fix dependencies of CONFIG_SUSPEND and CONFIG_HIBERNATION Dependencies of CONFIG_SUSPEND and CONFIG_HIBERNATION introduced by commit `296699de6b` "Introduce CONFIG_SUSPEND for suspend-to-Ram and standby" are incorrect, as they don't cover the facts that (1) not all architectures support suspend and (2) SMP hibernation is only possible on X86 and PPC64 (if CONFIG_PPC64_SWSUSP is set). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-31 01:42:22 -07:00
Oleg Nesterov	b07e35f94a	setpgid(child) fails if the child was forked by sub-thread Spotted by Marcin Kowalczyk <qrczak@knm.org.pl>. sys_setpgid(child) fails if the child was forked by sub-thread. Fix the "is it our child" check. The previous commit `ee0acf90d3` was not complete. (this patch asks for the new same_thread_group() helper, but mainline doesn't have it yet). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Cc: <stable@kernel.org> Tested-by: "Marcin 'Qrczak' Kowalczyk" <qrczak@knm.org.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-31 01:42:22 -07:00
Jonathan Lim	f2ab6d8889	Assign task_struct.exit_code before taskstats_exit() taskstats.ac_exitcode is assigned to task_struct.exit_code in bacct_add_tsk() through the following kernel function calls: do_exit() taskstats_exit() fill_pid() bacct_add_tsk() The problem is that in do_exit(), task_struct.exit_code is set to 'code' only after taskstats_exit() has been called. So we need to move the assignment before taskstats_exit(). Signed-off-by: Jonathan Lim <jlim@sgi.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-31 01:42:22 -07:00
Ingo Molnar	9f508f8258	sched: clean up task_new_fair() cleanup: we have the 'se' and 'curr' entity-pointers already, no need to use p->se and current->se. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de>	2007-08-28 12:53:24 +02:00
Ingo Molnar	213c8af67f	sched: small schedstat fix small schedstat fix: the cfs_rq->wait_runtime 'sum of all runtimes' statistics counters missed newly forked tasks and thus had a constant negative skew. Fix this. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de>	2007-08-28 12:53:24 +02:00
Ingo Molnar	b77d69db9f	sched: fix wait_start_fair condition in update_stats_wait_end() Peter Zijlstra noticed the following bug in SCHED_FEAT_SKIP_INITIAL (which is disabled by default at the moment): it relies on se.wait_start_fair being 0 while update_stats_wait_end() did not recognize a 0 value, so instead of 'skipping' the initial interval we gave the new child a maximum boost of +runtime-limit ... (No impact on the default kernel, but nice to fix for completeness.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de>	2007-08-28 12:53:24 +02:00
Ting Yang	7109c4429a	sched: call update_curr() in task_tick_fair() update the fair-clock before using it for the key value. [ mingo@elte.hu: small cleanups. ] Signed-off-by: Ting Yang <tingy@cs.umass.edu> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2007-08-28 12:53:24 +02:00
Ingo Molnar	f6cf891c4d	sched: make the scheduler converge to the ideal latency de-HZ-ification of the granularity defaults unearthed a pre-existing property of CFS: while it correctly converges to the granularity goal, it does not prevent run-time fluctuations in the range of [-gran ... 0 ... +gran]. With the increase of the granularity due to the removal of HZ dependencies, this becomes visible in chew-max output (with 5 tasks running): out: 28 . 27. 32 \| flu: 0 . 0 \| ran: 9 . 13 \| per: 37 . 40 out: 27 . 27. 32 \| flu: 0 . 0 \| ran: 17 . 13 \| per: 44 . 40 out: 27 . 27. 32 \| flu: 0 . 0 \| ran: 9 . 13 \| per: 36 . 40 out: 29 . 27. 32 \| flu: 2 . 0 \| ran: 17 . 13 \| per: 46 . 40 out: 28 . 27. 32 \| flu: 0 . 0 \| ran: 9 . 13 \| per: 37 . 40 out: 29 . 27. 32 \| flu: 0 . 0 \| ran: 18 . 13 \| per: 47 . 40 out: 28 . 27. 32 \| flu: 0 . 0 \| ran: 9 . 13 \| per: 37 . 40 average slice is the ideal 13 msecs and the period is picture-perfect 40 msecs. But the 'ran' field fluctuates around 13.33 msecs and there's no mechanism in CFS to keep that from happening: it's a perfectly valid solution that CFS finds. to fix this we add a granularity/preemption rule that knows about the "target latency", which makes tasks that run longer than the ideal latency run a bit less. The simplest approach is to simply decrease the preemption granularity when a task overruns its ideal latency. For this we have to track how much the task executed since its last preemption. ( this adds a new field to task_struct, but we can eliminate that overhead in 2.6.24 by putting all the scheduler timestamps into an anonymous union. ) with this change in place, chew-max output is fluctuation-less all around: out: 28 . 27. 39 \| flu: 0 . 2 \| ran: 13 . 13 \| per: 41 . 40 out: 28 . 27. 39 \| flu: 0 . 2 \| ran: 13 . 13 \| per: 41 . 40 out: 28 . 27. 39 \| flu: 0 . 2 \| ran: 13 . 13 \| per: 41 . 40 out: 28 . 27. 39 \| flu: 0 . 2 \| ran: 13 . 13 \| per: 41 . 40 out: 28 . 27. 39 \| flu: 0 . 1 \| ran: 13 . 13 \| per: 41 . 40 out: 28 . 27. 39 \| flu: 0 . 1 \| ran: 13 . 13 \| per: 41 . 40 this patch has no impact on any fastpath or on any globally observable scheduling property. (unless you have sharp enough eyes to see millisecond-level ruckles in glxgears smoothness :-) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de>	2007-08-28 12:53:24 +02:00
Mike Galbraith	5f01d519e6	sched: fix sleeper bonus limit There is an Amarok song switch time increase (regression) under hefty load. What is happening is that sleeper_bonus is never consumed, and only rarely goes below runtime_limit, so for the most part, Amarok isn't getting any bonus at all. We're keeping sleeper_bonus right at runtime_limit (sched_latency == sched_runtime_limit == 40ms) forever, ie we don't consume if we're lower that that, and don't add if we're above it. One Amarok thread waking (or anybody else) will push us past the threshold, so the next thread waking gets nada, but will reap pain from the previous thread waking until we drop back to runtime_limit. It looks to me like under load, some random task gets a bonus, and everybody else pays, whether deserving or not. This diff fixed the regression for me at any load rate. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2007-08-28 12:53:24 +02:00
Hugh Dickins	d243769d3f	fix bogus hotplug cpu warning Fix bogus DEBUG_PREEMPT warning on x86_64, when cpu brought online after bootup: current_is_keventd is right to note its use of smp_processor_id is preempt-safe, but should use raw_smp_processor_id to avoid the warning. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-27 10:27:48 -07:00
Ingo Molnar	50c46637aa	sched: s/sched_latency/sched_min_granularity runtime limit and wakeup granularity used to be a function of granularity and that was incorrect changed to sched_latency. Fix this to make wakeup granularity a function of min-granularity, and the runtime limit equal to latency. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-25 22:17:19 +02:00
Ingo Molnar	172ac3dbb7	sched: cleanup, sched_granularity -> sched_min_granularity due to adaptive granularity scheduling the role of sched_granularity has changed to "minimum granularity", so rename the variable (and the tunable) accordingly. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>	2007-08-25 18:41:53 +02:00
Peter Zijlstra	218050855e	sched: adaptive scheduler granularity Instead of specifying the preemption granularity, specify the wanted latency. By fixing the granlarity to a constany the wakeup latency it a function of the number of running tasks on the rq. Invert this relation. sysctl_sched_granularity becomes a minimum for the dynamic granularity computed from the new sysctl_sched_latency. Then use this latency to do more intelligent granularity decisions: if there are fewer tasks running then we can schedule coarser. This helps performance while still always keeping the latency target. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-25 18:41:53 +02:00
Peter Zijlstra	1fc84aaae3	sched: fix CONFIG_SCHED_DEBUG dependency of lockdep sysctls Make the lockdep sysctls not depend on CONFIG_SCHED_DEBUG. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-25 18:41:52 +02:00
Ingo Molnar	095e56c703	sched: fix startup penalty calculation fix task startup penalty miscalculation: sysctl_sched_granularity is unsigned int and wait_runtime is long so we first have to convert it to long before turning it negative ... Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-24 20:39:10 +02:00
Peter Zijlstra	ea0aa3b23a	sched: simplify bonus calculation #2 current code: delta = calc_delta_mine(delta_exec, curr->load.weight, lw); delta = min((u64)delta, cfs_rq->sleeper_bonus); Notice that this calc_delta_mine() line is exactly delta_mine, which gives: delta = min((u64)delta_mine, cfs_rq->sleeper_bonus); Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-24 20:39:10 +02:00
Peter Zijlstra	a6f2994042	sched: simplify bonus calculation #1 current code: delta = min(cfs_rq->sleeper_bonus, (u64)delta_exec); delta = calc_delta_mine(delta, curr->load.weight, lw); delta = min((u64)delta, cfs_rq->sleeper_bonus); drop the first min(), because we clip against sleeper_bonus in the 3rd line again. That gives: delta = calc_delta_mine(delta_exec, curr->load.weight, lw); delta = min((u64)delta, cfs_rq->sleeper_bonus); Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-24 20:39:10 +02:00
Ingo Molnar	b2133c8b1e	sched: tidy up and simplify the bonus balance make the bonus balance more consistent: do not hand out a bonus if there's too much in flight already, and only deduct as much from a runner as it has the capacity. This makes the bonus engine a zero-sum game (as intended). this also simplifies the code: text data bss dec hex filename 34770 2998 24 37792 93a0 sched.o.before 34749 2998 24 37771 938b sched.o.after and it also avoids overscheduling in sleep-happy workloads like hackbench.c. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-24 20:39:10 +02:00
Dmitry Adamushko	98fbc79853	sched: optimize task_tick_rt() a bit Mitchell Erblich suggested a quality-of-implementation change to not requeue SCHED_RR tasks if there's only a single task on the runqueue, by checking for rq->nr_running == 1. provide a more efficient implementation of that, to check that particular RT priority-queue only. [ From: mingo@elte.hu ] Also first requeue the task then set need_resched - results in slightly better machine-instruction ordering. Also clean up the code a bit. Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-24 20:39:10 +02:00
Sven-Thorsten Dietrich	deac4ee65a	sched: simplify can_migrate_task() Remove trivial conditional branch in Linux scheduler's can_migrate_task() function. text data bss dec hex filename 34770 2998 24 37792 93a0 sched.o.before 34757 2998 24 37779 9393 sched.o.after Signed-off-by: Sven-Thorsten Dietrich <sven@thebigcorporation.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-24 20:39:10 +02:00
Ingo Molnar	71fd371463	sched: remove HZ dependency from the granularity default remove HZ dependency from the granularity default. Use 10 msec for the base granularity, 1 msec for wakeup granularity and 25 msec for batch wakeup granularity. (These defaults are close to the values that the default HZ=250 setting got previously, and thus it's the most common setting.) Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-24 20:39:10 +02:00
Bruce Ashfield	7c6c16f354	sched: CONFIG_SCHED_GROUP_FAIR=y fixlet when I built with CONFIG_FAIR_GROUP_SCHED=y, I need the following change to make things right. [ From: mingo@elte.hu ] this config option is not upstream-configurable right now but lets fix this for completeness. Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-24 20:39:10 +02:00
Linus Torvalds	d0797b39dc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: tweak the sched_runtime_limit tunable sched: skip updating rq's next_balance under null SD sched: fix broken SMT/MC optimizations sched: accounting regression since rc1 sched: fix sysctl directory permissions sched: sched_clock_idle_[sleep\|wakeup]_event()	2007-08-23 21:38:39 -07:00
Linus Torvalds	de80af4cc9	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: sysfs: don't warn on removal of a nonexistent binary file HOWTO: latest lxr url address changed HOWTO: korean translation of Documentation/HOWTO Fix Off-by-one in /sys/module/*/refcnt sysfs: fix locking in sysfs_lookup() and sysfs_rename_dir()	2007-08-23 21:34:43 -07:00
Ingo Molnar	505c0efd58	sched: tweak the sched_runtime_limit tunable Michael Gerdau reported reniced task CPU usage weirdnesses. Such symptoms can be caused by limit underruns so double the sched_runtime_limit. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-23 15:18:02 +02:00
Suresh Siddha	f549da848e	sched: skip updating rq's next_balance under null SD Was playing with sched_smt_power_savings/sched_mc_power_savings and found out that while the scheduler domains are reconstructed when sysfs settings change, rebalance_domains() can get triggered with null domain on other cpus, which is setting next_balance to jiffies + 60*HZ. Resulting in no idle/busy balancing for 60 seconds. Fix this. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-23 15:18:02 +02:00
Suresh Siddha	f8700df7c4	sched: fix broken SMT/MC optimizations On a four package system with HT - HT load balancing optimizations were broken. For example, if two tasks end up running on two logical threads of one of the packages, scheduler is not able to pull one of the tasks to a completely idle package. In this scenario, for nice-0 tasks, imbalance calculated by scheduler will be 512 and find_busiest_queue() will return 0 (as each cpu's load is 1024 > imbalance and has only one task running). Similarly MC scheduler optimizations also get fixed with this patch. [ mingo@elte.hu: restored fair balancing by increasing the fuzz and adding it back to the power decision, without the /2 factor. ] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-23 15:18:02 +02:00
Eric W. Biederman	c57baf1e1e	sched: fix sysctl directory permissions There are two remaining gotchas: - The directories have impossible permissions (writeable). - The ctl_name for the kernel directory is inconsistent with everything else. It should be CTL_KERN. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-08-23 15:18:02 +02:00
Ingo Molnar	2aa44d0567	sched: sched_clock_idle_[sleep\|wakeup]_event() construct a more or less wall-clock time out of sched_clock(), by using ACPI-idle's existing knowledge about how much time we spent idling. This allows the rq clock to work around TSC-stops-in-C2, TSC-gets-corrupted-in-C3 type of problems. ( Besides the scheduler's statistics this also benefits blktrace and printk-timestamps as well. ) Furthermore, the precise before-C2/C3-sleep and after-C2/C3-wakeup callbacks allow the scheduler to get out the most of the period where the CPU has a reliable TSC. This results in slightly more precise task statistics. the ACPI bits were acked by Len. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Len Brown <len.brown@intel.com>	2007-08-23 15:18:02 +02:00
Oleg Nesterov	834d216e1f	signalfd: fix interaction with posix-timers dequeue_signal: if (__SI_TIMER) { spin_unlock(&tsk->sighand->siglock); do_schedule_next_timer(info); spin_lock(&tsk->sighand->siglock); } Unless tsk == curent, this is absolutely unsafe: nothing prevents tsk from exiting. If signalfd was passed to another process, do_schedule_next_timer() is just wrong. Add yet another "tsk == current" check into dequeue_signal(). This patch fixes an oopsable bug, but breaks the scheduling of posix timers if the shared __SI_TIMER signal was fetched via signalfd attached to another sub-thread. Mostly fixed by the next patch. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Roland McGrath <roland@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-22 19:52:46 -07:00
Oleg Nesterov	d02479bdeb	posix-timers: fix creation race sys_timer_create() sets ->it_process and unlocks ->siglock, then checks tmr->it_sigev_notify to define if get_task_struct() is needed. We already passed ->it_id to the caller, another thread can delete this timer and free its memory in between. As a minimal fix, move this code under ->siglock, sys_timer_delete() takes it too before calling release_posix_timer(). A proper serialization would be to take ->it_lock, we add a partly initialized timer on posix_timers_id, not good. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-22 19:52:46 -07:00
Thomas Gleixner	179394af7a	posix-timers: fix deletion race timer_delete does: lock_timer(); timer->it_process = NULL; unlock_timer(); release_posix_timer(); timer->it_process is checked in lock_timer() to prevent access to a timer, which is on the way to be deleted, but the check happens after idr_lock is dropped. This allows release_posix_timer() to delete the timer before the lock code can check the timer: CPU 0 CPU 1 lock_timer(); timer->it_process = NULL; unlock_timer(); lock_timer() spin_lock(idr_lock); timer = idr_find(); spin_lock(timer->lock); spin_unlock(idr_lock); release_posix_timer(); spin_lock(idr_lock); idr_remove(timer); spin_unlock(idr_lock); free_timer(timer); if (timer->......) Change the locking to prevent this. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-22 19:52:45 -07:00
Andrew Morton	8b7f07155f	free_irq(): fix DEBUG_SHIRQ handling If we're going to run the handler from free_irq() then we must do it with local irq's disabled. Otherwise lockdep complains that the handler is taking irq-safe spinlocks in a non-irq-safe fashion. Cc: Ingo Molnar <mingo@elte.hu> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-22 19:52:44 -07:00
john stultz	187226f57f	futex_unlock_pi() hurts my brain and may cause application deadlock Avoid futex_unlock_pi returning -EFAULT (which results in deadlock), by clearing uval before jumping to retry_locked. Signed-off-by: John Stultz <johnstul@us.ibm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-22 19:52:44 -07:00
Adrian Bunk	88ae704c2a	kernel/auditsc.c: fix an off-by-one This patch fixes an off-by-one in a BUG_ON() spotted by the Coverity checker. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Amy Griffis <amy.griffis@hp.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-22 19:52:44 -07:00
Alexey Dobriyan	256e2fdf03	Fix Off-by-one in /sys/module/*/refcnt sysfs internals were changed to not pin module in question. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Kay Sievers <kay.sievers@vrfy.org> Acked-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-08-22 14:35:35 -07:00
Robin Getz	cb00e99c0a	fix - ensure we don't use bootconsoles after init has been released Gerd Hoffmann pointed out that my patch from yesterday can lead to a null pointer dereference if the kernel is booted with no console, and no earlyprintk defined. This fixes that issue. Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-21 20:23:53 -07:00
Robin Getz	0c5564bd91	ensure we don't use bootconsoles after init has been released This is a followup to the cleanups for earlyprintk patch from Gerd Hoffmann http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=69331af79cf29e26d1231152a172a1a10c2df511 This ensures that a bootconsole is unregistered if it is not replaced. The current implementation spews garbage out the bootconsole in this case, since the bootconsole structure is normally in the init section, and is freed, but still used. Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Cc: Mike Frysinger <vapier.adi@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-20 22:42:01 -07:00
Christian Heim	e598fbaabd	Remove double inclusion of linux/capability.h Remove the second inclusion of linux/capability.h, which has been introduced with "[PATCH] move capable() to capability.h" (commit `c59ede7b78`) Signed-off-by: Christian Heim <phreak@gentoo.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-08-19 10:12:32 -07:00

1 2 3 4 5 ...

2522 Commits