Commit Graph

275488 Commits

Author SHA1 Message Date
Suresh Siddha
cd490c5b28 sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer
Intention is to set the NOHZ_BALANCE_KICK flag for the 'ilb_cpu'. Not
for the 'cpu' which is the local cpu. Fix the typo.

Reported-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1323199594.1984.18.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 20:51:29 +01:00
Suresh Siddha
8a6d42d1b3 sched, nohz: Fix the idle cpu check in nohz_idle_balance
cpu bit in the nohz.idle_cpu_mask are reset in the first busy tick after
exiting idle. So during nohz_idle_balance(), intention is to double
check if the cpu that is part of the idle_cpu_mask is indeed idle before
going ahead in performing idle balance for that cpu.

Fix the cpu typo in the idle_cpu() check during nohz_idle_balance().

Reported-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1323199177.1984.12.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 20:51:27 +01:00
Peter Zijlstra
f8b6d1cc7d sched: Use jump_labels for sched_feat
Now that we initialize jump_labels before sched_init() we can use them
for the debug features without having to worry about a window where
they have the wrong setting.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-vpreo4hal9e0kzqmg5y0io2k@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 20:51:26 +01:00
Glauber Costa
be726ffd1e sched/accounting: Fix parameter passing in task_group_account_field
The order of parameters is inverted. The index parameter
should come first.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1322863119-14225-3-git-send-email-glommer@parallels.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 20:51:24 +01:00
Glauber Costa
1c77f38ad6 sched/accounting: Fix user/system tick double accounting
Now that we're  pointing cpuacct's root cgroup to cpustat and accounting
through task_group_account_field(), we should not access cpustat directly.
Since it is done anyway inside the acessor function, we end up accounting
it twice, which is wrong.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1322863119-14225-2-git-send-email-glommer@parallels.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 20:51:23 +01:00
Glauber Costa
54c707e98d sched/accounting: Re-use scheduler statistics for the root cgroup
Right now, after we collect tick statistics for user and system and store them
in a well known location, we keep the same statistics again for cpuacct.
Since cpuacct is hierarchical, the numbers for the root cgroup should be
absolutely equal to the system-wide numbers.

So it would be better to just use it: this patch changes cpuacct accounting
in a way that the cpustat statistics are kept in a struct kernel_cpustat percpu
array. In the root cgroup case, we just point it to the main array. The rest of
the hierarchy walk can be totally disabled later with a static branch - but I am
not doing it here.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Tuner <pjt@google.com>
Link: http://lkml.kernel.org/r/1322498719-2255-4-git-send-email-glommer@parallels.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 20:51:21 +01:00
Mike Galbraith
b39e66eaf9 sched: Save some hrtick_start_fair cycles
hrtick_start_fair() shows up in profiles even when disabled.

v3.0.6

taskset -c 3 pipe-test

   PerfTop:     997 irqs/sec  kernel:89.5%  exact:  0.0% [1000Hz cycles],  (all, CPU: 3)
------------------------------------------------------------------------------------------------

             Virgin                                    Patched
             samples  pcnt function                    samples  pcnt function
             _______ _____ ___________________________ _______ _____ ___________________________

             2880.00 10.2% __schedule                  3136.00 11.3% __schedule
             1634.00  5.8% pipe_read                   1615.00  5.8% pipe_read
             1458.00  5.2% system_call                 1534.00  5.5% system_call
             1382.00  4.9% _raw_spin_lock_irqsave      1412.00  5.1% _raw_spin_lock_irqsave
             1202.00  4.3% pipe_write                  1255.00  4.5% copy_user_generic_string
             1164.00  4.1% copy_user_generic_string    1241.00  4.5% __switch_to
             1097.00  3.9% __switch_to                  929.00  3.3% mutex_lock
              872.00  3.1% mutex_lock                   846.00  3.0% mutex_unlock
              687.00  2.4% mutex_unlock                 804.00  2.9% pipe_write
              682.00  2.4% native_sched_clock           713.00  2.6% native_sched_clock
              643.00  2.3% system_call_after_swapgs     653.00  2.3% _raw_spin_unlock_irqrestore
              617.00  2.2% sched_clock_local            633.00  2.3% fsnotify
              612.00  2.2% fsnotify                     605.00  2.2% sched_clock_local
              596.00  2.1% _raw_spin_unlock_irqrestore  593.00  2.1% system_call_after_swapgs
              542.00  1.9% sysret_check                 559.00  2.0% sysret_check
              467.00  1.7% fget_light                   472.00  1.7% fget_light
              462.00  1.6% finish_task_switch           461.00  1.7% finish_task_switch
              437.00  1.5% vfs_write                    442.00  1.6% vfs_write
              431.00  1.5% do_sync_write                428.00  1.5% do_sync_write
              413.00  1.5% select_task_rq_fair          404.00  1.5% _raw_spin_lock_irq
              386.00  1.4% update_curr                  402.00  1.4% update_curr
              385.00  1.4% rw_verify_area               389.00  1.4% do_sync_read
              377.00  1.3% _raw_spin_lock_irq           378.00  1.4% vfs_read
              369.00  1.3% do_sync_read                 340.00  1.2% pipe_iov_copy_from_user
              360.00  1.3% vfs_read                     316.00  1.1% __wake_up_sync_key
*             342.00  1.2% hrtick_start_fair            313.00  1.1% __wake_up_common

Signed-off-by: Mike Galbraith <efault@gmx.de>
[ fixed !CONFIG_SCHED_HRTICK borkage ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1321971607.6855.17.camel@marge.simson.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 20:51:20 +01:00
Peter Zijlstra
fdaabd800b sched: Fix compile error for UP,!NOHZ
Commit 69e1e811 ("sched, nohz: Track nr_busy_cpus in the
sched_group_power") messed up the static inline function definition.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/n/tip-abjah8ctq5qrjjtdiabe8lph@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 20:51:18 +01:00
Glauber Costa
44252e421a sched/accounting, cgroups: Reuse cgroup's parent pointer
We already have a pointer to the cgroup parent (whose data is more likely
to be in the cache than this, anyway), so there is no need to have this one
in cpuacct.

This patch makes the underlying cgroup be used instead.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Tuner <pjt@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1322498719-2255-3-git-send-email-glommer@parallels.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 09:06:40 +01:00
Glauber Costa
3292beb340 sched/accounting: Change cpustat fields to an array
This patch changes fields in cpustat from a structure, to an
u64 array. Math gets easier, and the code is more flexible.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Tuner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1322498719-2255-2-git-send-email-glommer@parallels.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 09:06:38 +01:00
Suresh Siddha
786d6dc7ae sched, nohz: Clean up the find_new_ilb() using sched groups nr_busy_cpus
nr_busy_cpus in the sched_group_power indicates whether the group
is semi idle or not. This helps remove the is_semi_idle_group() and simplify
the find_new_ilb() in the context of finding an optimal cpu that can do
idle load balancing.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20111202010832.656983582@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 09:06:36 +01:00
Suresh Siddha
0b005cf54e sched, nohz: Implement sched group, domain aware nohz idle load balancing
When there are many logical cpu's that enter and exit idle often, members of
the global nohz data structure are getting modified very frequently causing
lot of cache-line contention.

Make the nohz idle load balancing more scalabale by using the sched domain
topology and 'nr_busy_cpu's in the struct sched_group_power.

Idle load balance is kicked on one of the idle cpu's when there is atleast
one idle cpu and:

 - a busy rq having more than one task or

 - a busy rq's scheduler group that share package resources (like HT/MC
   siblings) and has more than one member in that group busy or

 - for the SD_ASYM_PACKING domain, if the lower numbered cpu's in that
   domain are idle compared to the busy ones.

This will help in kicking the idle load balancing request only when
there is a potential imbalance. And once it is mostly balanced, these kicks will
be minimized.

These changes helped improve the workload that is context switch intensive
between number of task pairs by 2x on a 8 socket NHM-EX based system.

Reported-by: Tim Chen <tim.c.chen@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20111202010832.602203411@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 09:06:34 +01:00
Suresh Siddha
69e1e811dc sched, nohz: Track nr_busy_cpus in the sched_group_power
Introduce nr_busy_cpus in the struct sched_group_power [Not in sched_group
because sched groups are duplicated for the SD_OVERLAP scheduler domain]
and for each cpu that enters and exits idle, this parameter will
be updated in each scheduler group of the scheduler domain that this cpu
belongs to.

To avoid the frequent update of this state as the cpu enters
and exits idle, the update of the stat during idle exit is
delayed to the first timer tick that happens after the cpu becomes busy.
This is done using NOHZ_IDLE flag in the struct rq's nohz_flags.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20111202010832.555984323@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 09:06:32 +01:00
Suresh Siddha
1c792db7f7 sched, nohz: Introduce nohz_flags in 'struct rq'
Introduce nohz_flags in the struct rq, which will track these two flags
for now.

NOHZ_TICK_STOPPED keeps track of the tick stopped status that gets set when
the tick is stopped. It will be used to update the nohz idle load balancer data
structures during the first busy tick after the tick is restarted. At this
first busy tick after tickless idle, NOHZ_TICK_STOPPED flag will be reset.
This will minimize the nohz idle load balancer status updates that currently
happen for every tickless exit, making it more scalable when there
are many logical cpu's that enter and exit idle often.

NOHZ_BALANCE_KICK will track the need for nohz idle load balance
on this rq. This will replace the nohz_balance_kick in the rq, which was
not being updated atomically.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20111202010832.499438999@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 09:06:30 +01:00
Shan Hai
5b680fd613 sched/rt: Code cleanup, remove a redundant function call
The second call to sched_rt_period() is redundant, because the value of the
rt_runtime was already read and it was protected by the ->rt_runtime_lock.

Signed-off-by: Shan Hai <haishan.bai@gmail.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1322535836-13590-2-git-send-email-haishan.bai@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 09:06:28 +01:00
Suresh Siddha
4d78a2239e sched: Fix the sched group node allocation for SD_OVERLAP domains
For the SD_OVERLAP domain, sched_groups for each CPU's sched_domain are
privately allocated and not shared with any other cpu. So the
sched group allocation should come from the cpu's node for which
SD_OVERLAP sched domain is being setup.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111118230554.164910950@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 09:06:26 +01:00
Mike Galbraith
916671c08b sched: Set skip_clock_update in yield_task_fair()
This is another case where we are on our way to schedule(),
so can save a useless clock update and resulting microscopic
vruntime update.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1321971686.6855.18.camel@marge.simson.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 09:06:24 +01:00
Mike Galbraith
76854c7e8f sched: Use rt.nr_cpus_allowed to recover select_task_rq() cycles
rt.nr_cpus_allowed is always available, use it to bail from select_task_rq()
when only one cpu can be used, and saves some cycles for pinned tasks.

See the line marked with '*' below:

  # taskset -c 3 pipe-test

   PerfTop:     997 irqs/sec  kernel:89.5%  exact:  0.0% [1000Hz cycles],  (all, CPU: 3)
------------------------------------------------------------------------------------------------

             Virgin                                    Patched
             samples  pcnt function                    samples  pcnt function
             _______ _____ ___________________________ _______ _____ ___________________________

             2880.00 10.2% __schedule                  3136.00 11.3% __schedule
             1634.00  5.8% pipe_read                   1615.00  5.8% pipe_read
             1458.00  5.2% system_call                 1534.00  5.5% system_call
             1382.00  4.9% _raw_spin_lock_irqsave      1412.00  5.1% _raw_spin_lock_irqsave
             1202.00  4.3% pipe_write                  1255.00  4.5% copy_user_generic_string
             1164.00  4.1% copy_user_generic_string    1241.00  4.5% __switch_to
             1097.00  3.9% __switch_to                  929.00  3.3% mutex_lock
              872.00  3.1% mutex_lock                   846.00  3.0% mutex_unlock
              687.00  2.4% mutex_unlock                 804.00  2.9% pipe_write
              682.00  2.4% native_sched_clock           713.00  2.6% native_sched_clock
              643.00  2.3% system_call_after_swapgs     653.00  2.3% _raw_spin_unlock_irqrestore
              617.00  2.2% sched_clock_local            633.00  2.3% fsnotify
              612.00  2.2% fsnotify                     605.00  2.2% sched_clock_local
              596.00  2.1% _raw_spin_unlock_irqrestore  593.00  2.1% system_call_after_swapgs
              542.00  1.9% sysret_check                 559.00  2.0% sysret_check
              467.00  1.7% fget_light                   472.00  1.7% fget_light
              462.00  1.6% finish_task_switch           461.00  1.7% finish_task_switch
              437.00  1.5% vfs_write                    442.00  1.6% vfs_write
              431.00  1.5% do_sync_write                428.00  1.5% do_sync_write
*             413.00  1.5% select_task_rq_fair          404.00  1.5% _raw_spin_lock_irq
              386.00  1.4% update_curr                  402.00  1.4% update_curr
              385.00  1.4% rw_verify_area               389.00  1.4% do_sync_read
              377.00  1.3% _raw_spin_lock_irq           378.00  1.4% vfs_read
              369.00  1.3% do_sync_read                 340.00  1.2% pipe_iov_copy_from_user
              360.00  1.3% vfs_read                     316.00  1.1% __wake_up_sync_key
              342.00  1.2% hrtick_start_fair            313.00  1.1% __wake_up_common

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1321971504.6855.15.camel@marge.simson.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 08:51:26 +01:00
Suresh Siddha
77e81365e0 sched: Clean up domain traversal in select_idle_sibling()
Instead of going through the scheduler domain hierarchy multiple times
(for giving priority to an idle core over an idle SMT sibling in a busy
core), start with the highest scheduler domain with the SD_SHARE_PKG_RESOURCES
flag and traverse the domain hierarchy down till we find an idle group.

This cleanup also addresses an issue reported by Mike where the recent
changes returned the busy thread even in the presence of an idle SMT
sibling in single socket platforms.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1321556904.15339.25.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 08:51:25 +01:00
Andrew Vagin
b781a602ac events, sched: Add tracepoint for accounting blocked time
This tracepoint shows how long a task is sleeping in uninterruptible state.

E.g. it may show how long and where a mutex is waited for.

Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1322471015-107825-8-git-send-email-avagin@openvz.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-06 08:51:23 +01:00
Peter Zijlstra
391e43da79 sched: Move all scheduler bits into kernel/sched/
There's too many sched*.[ch] files in kernel/, give them their own
directory.

(No code changed, other than Makefile glue added.)

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-17 12:20:22 +01:00
Peter Zijlstra
029632fbb7 sched: Make separate sched*.c translation units
Since once needs to do something at conferences and fixing compile
warnings doesn't actually require much if any attention I decided
to break up the sched.c #include "*.c" fest.

This further modularizes the scheduler code.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-x0fcd3mnp8f9c99grcpewmhi@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-17 12:20:19 +01:00
Richard Weinberger
60686317da sched: Fix comment for requeue_rt_entity
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1321117677-3282-1-git-send-email-richard@nod.at
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-16 08:48:27 +01:00
Andrew Vagin
a3e5d1091c sched: Don't call task_group() too many times in set_task_rq()
It improves perfomance, especially if autogroup is enabled.

The size of set_task_rq() was 0x180 and now it is 0xa0.

Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1321020240-3874331-1-git-send-email-avagin@openvz.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-16 08:48:24 +01:00
Glauber Costa
f4d6f6c264 sched, trivial: Initialize root cgroup's sibling list
Even though there are no siblings, the list should be
initialized to not contain bogus values.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Paul Menage <paul@paulmenage.org>
Acked-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1320182360-20043-2-git-send-email-glommer@parallels.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-16 08:48:22 +01:00
Paul Turner
56f570e512 sched: Use jump labels to reduce overhead when bandwidth control is inactive
Now that the linkage of jump-labels has been fixed they show a measurable
improvement in overhead for the enabled-but-unused case.

Workload is:

 'taskset -c 0 perf stat --repeat 50 -e instructions,cycles,branches
  bash -c "for ((i=0;i<5;i++)); do $(dirname $0)/pipe-test 20000; done"'

There's a speedup for all situations:

                 instructions            cycles                  branches
 -------------------------------------------------------------------------
 Intel Westmere
 base            806611770               745895590               146765378
 +jumplabel      803090165 (-0.44%)      713381840 (-4.36%)      144561130

 AMD Barcelona
 base            824657415               740055589               148855354
 +jumplabel      821056910 (-0.44%)      737558389 (-0.34%)      146635229

Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111108042736.560831357@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-16 08:48:18 +01:00
Paul Turner
fccfdc6f0d sched: Fix buglet in return_cfs_rq_runtime()
In return_cfs_rq_runtime() we want to return bandwidth when there are no
remaining tasks, not "return" when this is the case.

Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111108042736.623812423@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-16 08:43:45 +01:00
Peter Zijlstra
4dcfe1025b sched: Avoid SMT siblings in select_idle_sibling() if possible
Avoid select_idle_sibling() from picking a sibling thread if there's
an idle core that shares cache.

This fixes SMT balancing in the increasingly common case where there's
a shared cache core available to balance to.

Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1321350377.1421.55.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-16 08:43:43 +01:00
Carsten Emde
f1c6f1a7ee sched: Set the command name of the idle tasks in SMP kernels
In UP systems, the idle task is initialized using the init_task
structure from which the command name is taken (currently "swapper").

In SMP systems, one idle task per CPU is forked by the worker thread
from which the task structure is copied. The command name is, therefore,
"kworker/0:0" or "kworker/0:1", if not updated. Since such update was
lacking, all idle tasks in SMP systems were incorrectly named. This
longtime bug was not discovered immediately, because there is no /proc/0
entry - the bug only becomes apparent when tracing is enabled.

This patch sets the command name of the idle tasks in SMP systems to the
name that is used in the INIT_TASK structure suffixed by a slash and the
number of the CPU.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111026211708.768925506@osadl.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14 12:50:43 +01:00
Peter Zijlstra
4a6184ce7a sched, rt: Provide means of disabling cross-cpu bandwidth sharing
Normally the RT bandwidth scheme will share bandwidth across the
entire root_domain. However sometimes its convenient to disable this
sharing for debug purposes. Provide a simple feature switch to this
end.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14 12:50:40 +01:00
J. Bruce Fields
c6dc7f055d sched: Document wait_for_completion_*() return values
The return-value convention for these functions varies depending on
whether they're interruptible or can timeout.  It can be a little
confusing--document it.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111006192246.GB28026@fieldses.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14 12:50:37 +01:00
Hui Kang
461819ac8e sched_fair: Fix a typo in the comment describing update_sd_lb_stats
Signed-off-by: Hui Kang <hkang.sunysb@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1318388459-4427-1-git-send-email-hkang.sunysb@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14 12:50:34 +01:00
Peter Zijlstra
cf5f0acf39 sched: Add a comment to effective_load() since it's a pain
Every time I have to stare at this function I need to completely
reverse engineer its workings, about time I write a comment
explaining the thing.

Collected bits and pieces from previous changelogs, mostly:

  4be9daaa1b
  83378269a5

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1318518057.27731.2.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14 12:50:32 +01:00
Linus Torvalds
7f80850d3f Merge branch 'rmobile-fixes-for-linus' of git://github.com/pmundt/linux-sh
* 'rmobile-fixes-for-linus' of git://github.com/pmundt/linux-sh:
  ARM: mach-shmobile: cpuidle single/global and last_state fixes
  ARM: mach-shmobile: move helper macro PORTCR to sh_pfc.h
  ARM: mach-shmobile: move helper macro PORT_xx to sh_pfc.h
  ARM: mach-shmobile: move helper macro PORT_DATA_xx to sh_pfc.h
  ARM: mach-shmobile: ap4evb: remove white space from end of line
  ARM: mach-shmobile: clock-sh7372: remove un-necessary index
  ARM: mach-shmobile: kota2: add comment out separator
  ARM: mach-shmobile: sh73a0: add MMC data pin pull-up
2011-11-14 06:47:04 -02:00
Linus Torvalds
b93cd6a0c7 Merge branch 'sh-fixes-for-linus' of git://github.com/pmundt/linux-sh
* 'sh-fixes-for-linus' of git://github.com/pmundt/linux-sh:
  mailmap: Fix up some renesas attributions
  sh: clkfwk: Kill off remaining debugfs cruft.
  drivers: sh: Kill off dead pathname for runtime PM stub.
  drivers: sh: Generalize runtime PM platform stub.
  sh: Wire up process_vm syscalls.
  sh: clkfwk: add clk_rate_mult_range_round()
  serial: sh-sci: Fix up SH-2A SCIF support.
  sh: Fix cached/uncaced address calculation in 29bit mode
2011-11-14 06:45:30 -02:00
Linus Torvalds
d291ffb3cf Merge git://github.com/rustyrussell/linux
* git://github.com/rustyrussell/linux:
  virtio-pci: fix use after free
2011-11-14 04:00:01 -02:00
Michael S. Tsirkin
72103bd128 virtio-pci: fix use after free
Commit 31a3ddda16 introduced
a use after free in virtio-pci. The main issue is
that the release method signals removal of the virtio device,
while remove signals removal of the pci device.

For example, on driver removal or hot-unplug,
virtio_pci_release_dev is called before virtio_pci_remove.
We then might get a crash as virtio_pci_remove tries to use the
device freed by virtio_pci_release_dev.

We allocate/free all resources together with the
pci device, so we can leave the release method empty.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
2011-11-14 11:16:26 +10:30
Linus Torvalds
52e4c2a052 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/radeon/kms/combios: fix dynamic allocation of PM clock modes
2011-11-13 17:09:55 -02:00
Rafael J. Wysocki
3439a8da16 ACPI / cpuidle: Remove acpi_idle_suspend (to fix suspend regression)
After commit e978aa7d7d ("cpuidle: Move dev->last_residency update to
driver enter routine; remove dev->last_state") setting acpi_idle_suspend
to 1 by acpi_processor_suspend() causes the ACPI cpuidle routines to
return error codes continuously, which in turn causes cpuidle to lock up
(hard).

However, acpi_idle_suspend doesn't appear to be useful for any
particular purpose (it's racy and doesn't really provide any real
protection), so it can be removed, which makes the problem go away.

Reported-and-tested-by: Tomas M. <tmezzadra@gmail.com>
Reported-and-tested-by: Ferenc Wagner <wferi@niif.hu>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-12 21:30:14 -02:00
Alex Deucher
a7c36fd8c5 drm/radeon/kms/combios: fix dynamic allocation of PM clock modes
I missed the combios path when I updated the atombios pm code.

Reported by amarsh04 on IRC.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-11-12 17:46:40 +00:00
Linus Torvalds
5b34b08996 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  arm/imx: fix imx6q mmc error when mounting rootfs
  arm/imx: fix AUTO_ZRELADDR selection
  arm/imx: fix the references to ARCH_MX3
  ARM: mx51/53: set pwm clock parent to ipg_perclk
  arm/tegra: enable headphone detection gpio on seaboard
  arm/dt: Fix ventana SDHCI power-gpios
  arm/tegra: Don't create duplicate gpio and pinmux devices
  ARM: at91: Fix USBA gadget registration
  atmel/spi: fix missing probe
  at91/yl-9200: Fix section mismatch
  at91: vmalloc fix missing AT91_VIRT_BASE define
  ARM: at91: usart: drop static map regs for dbgu
  ARM: picoxcell: add extra temp register to addruart
  ARM: msm: fix compilation flags for MSM_SCM
  arm/mxs: fix mmc device adding for mach-mx28evk
  ARM: mxc: Remove test_for_ltirq
  ARM:i.MX: fix build error in clock-mx51-mx53.c
  ARM:i.MX: fix build error in tzic/avic.c
  ARM: mxc: fix local timer interrupt handling
  msm: boards: Fix fallout from removal of machine_desc in fixup
2011-11-12 13:29:04 -02:00
Axel Lin
eb0b38a5d2 [CPUFREQ] db8500: fix build error due to undeclared i variable
The variable i is removed by commit ded8433
"[CPUFREQ] db8500: remove unneeded for loop iteration over freq_table",
but current code to print available frequencies still uses the i variable.
Thus add the i variable back to fix below buld error:

  CC      drivers/cpufreq/db8500-cpufreq.o
drivers/cpufreq/db8500-cpufreq.c: In function 'db8500_cpufreq_init':
drivers/cpufreq/db8500-cpufreq.c:123: error: 'i' undeclared (first use in this function)
drivers/cpufreq/db8500-cpufreq.c:123: error: (Each undeclared identifier is reported only once
drivers/cpufreq/db8500-cpufreq.c:123: error: for each function it appears in.)
make[2]: *** [drivers/cpufreq/db8500-cpufreq.o] Error 1
make[1]: *** [drivers/cpufreq] Error 2
make: *** [drivers] Error 2

This patch also fixes using uninitialized i variable as array index.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2011-11-11 22:28:33 -05:00
Linus Torvalds
8f042aa75a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: (29 commits)
  m68k/mac: Remove mac_irq_{en,dis}able() wrappers
  m68k/irq: Remove obsolete support for user vector interrupt fixups
  m68k/irq: Remove obsolete m68k irq framework
  m68k/q40: Convert Q40/Q60 to genirq
  m68k/sun3: Convert Sun3/3x to genirq
  m68k/sun3: Use the kstat_irqs_cpu() wrapper
  m68k/apollo: Convert Apollo to genirq
  m68k/vme: Convert VME to genirq
  m68k/hp300: Convert HP9000/300 and HP9000/400 to genirq
  m68k/mac: Optimize interrupts using chain handlers
  m68k/mac: Convert Mac to genirq
  m68k/amiga: Optimize interrupts using chain handlers
  m68k/amiga: Convert Amiga to genirq
  m68k/amiga: Refactor amiints.c
  m68k/atari: Remove code and comments about different irq types
  m68k/atari: Convert Atari to genirq
  m68k/irq: Add genirq support
  m68k/irq: Remove obsolete IRQ_FLG_* users
  m68k/irq: Rename {,__}m68k_handle_int()
  m68k/irq: Add m68k_setup_irq_controller()
  ...
2011-11-12 00:04:51 -02:00
Linus Torvalds
e6f1227e8b Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] v4l2-ctrl: Send change events to all fh for auto cluster slave controls
  [media] v4l2-event: Don't set sev->fh to NULL on unsubscribe
  [media] v4l2-event: Remove pending events from fh event queue when unsubscribing
  [media] v4l2-event: Deny subscribing with a type of V4L2_EVENT_ALL
  [media] MAINTAINERS: add a maintainer for s5p-mfc driver
  [media] v4l: s5p-mfc: fix reported capabilities
  [media] media: vb2: reset queued list on REQBUFS(0) call
  [media] media: vb2: set buffer length correctly for all buffer types
  [media] media: vb2: add a check for uninitialized buffer
  [media] mxl111sf: fix build warning
  [media] mxl111sf: remove pointless if condition in mxl111sf_config_spi
  [media] mxl111sf: check for errors after mxl111sf_write_reg in mxl111sf_idac_config
  [media] mxl111sf: fix return value of mxl111sf_idac_config
  [media] uvcvideo: GET_RES should only be checked for BITMAP type menu controls
2011-11-12 00:03:50 -02:00
Linus Torvalds
3455229fd6 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/kvm: Fix build failure with HV KVM and CBE
  powerpc/ps3: Fix lv1_gpu_attribute hcall
  powerpc/ps3: Fix PS3 repository build warnings
  powerpc/ps3: irq: Remove IRQF_DISABLED
  powerpc/irq: Remove IRQF_DISABLED
  powerpc/numa: NUMA topology support for PowerNV
  powerpc: Add System RAM to /proc/iomem
  powerpc: Add KVM as module to defconfigs
  powerpc/kvm: Fix build with older toolchains
  powerpc, tqm5200: update tqm5200_defconfig to fit for charon board.
  powerpc/5200: add support for charon board
2011-11-12 00:01:51 -02:00
Linus Torvalds
732783fea0 Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kbuild: Fix missing system calls check on mips.
2011-11-11 23:59:40 -02:00
William Douglas
9f80d8b68f bma023: Add SFI translation for this device
This needed the sfi IRQ 0xFF fix to go in first. It simply plumbs in the
bma023 driver with the firmware naming of it.

Signed-off-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-11 23:58:58 -02:00
Feng Tang
57e6319dd6 vrtc: change its year offset from 1960 to 1972
Real world year equals the value in vrtc YEAR register plus an offset.
We used 1960 as the offset to make leap year consistent, but for a
device's first use, its YEAR register is 0 and the system year will
be parsed as 1960 which is not a valid UNIX time and will cause many
applications to fail mysteriously. So we use 1972 instead to fix this
issue.

Updated patch which adds a sanity check suggested by Mathias

This isn't a change in behaviour for systems, because 1972 is the one we
actually use. It's the old version in upstream which is out of sync with
all devices.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-11 23:58:58 -02:00
Zhang Rui
f2ee442115 ce4100: fix a build error
Fix a build error. CE4100 with no serial errors because the alternate
function is only a prototype not a null function as intended.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-11 23:58:58 -02:00
Linus Torvalds
87618e0003 Merge branch 'upstream-linus' of git://github.com/jgarzik/libata-dev
* 'upstream-linus' of git://github.com/jgarzik/libata-dev:
  pata_of_platform: Don't use NO_IRQ
  [libata] ahci: Add ASMedia ASM1061 support
  [libata] Issue SRST to Sil3726 PMP
  sata_sis.c: trivial spelling fix
  ahci_platform: use dev_get_platdata()
  [libata] libata-scsi.c: Add function parameter documentation
2011-11-11 23:55:01 -02:00