Commit Graph

15820 Commits

Author SHA1 Message Date
Ingo Molnar
b1fe9987b7 Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:

"The major changes for this series are:

 1.      Simplify RCU's grace-period and callback processing based on
         the new numbering for callbacks.  These were posted to LKML at
         https://lkml.org/lkml/2013/5/20/330.

 2.      Documentation updates.  These were posted to LKML at
         https://lkml.org/lkml/2013/5/20/348.

 3.      Miscellaneous fixes, including converting a few remaining printk()
         calls to pr_*().  These were posted to LKML at
         https://lkml.org/lkml/2013/5/20/324.

 4.      SRCU-related changes and fixes.  These were posted to LKML at
         https://lkml.org/lkml/2013/5/20/425.

 5.      Removal of TINY_PREEMPT_RCU in favor of TREE_PREEMPT_RCU for
         single-CPU low-latency systems.  These were posted to LKML at
         https://lkml.org/lkml/2013/5/20/427."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 13:48:57 +02:00
Linus Torvalds
d0ff934881 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS fixes from Al Viro:
 "Several fixes + obvious cleanup (you've missed a couple of open-coded
  can_lookup() back then)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  snd_pcm_link(): fix a leak...
  use can_lookup() instead of direct checks of ->i_op->lookup
  move exit_task_namespaces() outside of exit_notify()
  fput: task_work_add() can fail if the caller has passed exit_task_work()
  ncpfs: fix rmdir returns Device or resource busy
2013-06-14 19:18:56 -10:00
Oleg Nesterov
8aac62706a move exit_task_namespaces() outside of exit_notify()
exit_notify() does exit_task_namespaces() after
forget_original_parent(). This was needed to ensure that ->nsproxy
can't be cleared prematurely, an exiting child we are going to
reparent can do do_notify_parent() and use the parent's (ours) pid_ns.

However, after 32084504 "pidns: use task_active_pid_ns in
do_notify_parent" ->nsproxy != NULL is no longer needed, we rely
on task_active_pid_ns().

Move exit_task_namespaces() from exit_notify() to do_exit(), after
exit_fs() and before exit_task_work().

This solves the problem reported by Andrey, free_ipc_ns()->shm_destroy()
does fput() which needs task_work_add().

Note: this particular problem can be fixed if we change fput(), and
that change makes sense anyway. But there is another reason to move
the callsite. The original reason for exit_task_namespaces() from
the middle of exit_notify() was subtle and it has already gone away,
now this looks confusing. And this allows us do simplify exit_notify(),
we can avoid unlock/lock(tasklist) and we can use ->exit_state instead
of PF_EXITING in forget_original_parent().

Reported-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-15 05:39:08 +04:00
Linus Torvalds
cb7e9704d5 Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU fixes from Paul McKenney:
 "I must confess that this past merge window was not RCU's best showing.
  This series contains three more fixes for RCU regressions:

   1.   A fix to __DECLARE_TRACE_RCU() that causes it to act as an
        interrupt from idle rather than as a task switch from idle.
        This change is needed due to the recent use of _rcuidle()
        tracepoints that can be invoked from interrupt handlers as well
        as from idle.  Without this fix, invoking _rcuidle() tracepoints
        from interrupt handlers results in splats and (more seriously)
        confusion on RCU's part as to whether a given CPU is idle or not.
        This confusion can in turn result in too-short grace periods and
        therefore random memory corruption.

   2.   A fix to a subtle deadlock that could result due to RCU doing
        a wakeup while holding one of its rcu_node structure's locks.
        Although the probability of occurrence is low, it really
        does happen.  The fix, courtesy of Steven Rostedt, uses
        irq_work_queue() to avoid the deadlock.

   3.   A fix to a silent deadlock (invisible to lockdep) due to the
        interaction of timeouts posted by RCU debug code enabled by
        CONFIG_PROVE_RCU_DELAY=y, grace-period initialization, and CPU
        hotplug operations.  This will not occur in production kernels,
        but really does occur in randconfig testing.  Diagnosis courtesy
        of Steven Rostedt"

* 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration
  rcu: Don't call wakeup() with rcu_node structure ->lock held
  trace: Allow idle-safe tracepoints to be called from irq
2013-06-13 12:36:42 -07:00
Linus Torvalds
a568fa1c91 Merge branch 'akpm' (updates from Andrew Morton)
Merge misc fixes from Andrew Morton:
 "Bunch of fixes and one little addition to math64.h"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits)
  include/linux/math64.h: add div64_ul()
  mm: memcontrol: fix lockless reclaim hierarchy iterator
  frontswap: fix incorrect zeroing and allocation size for frontswap_map
  kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules()
  mm: migration: add migrate_entry_wait_huge()
  ocfs2: add missing lockres put in dlm_mig_lockres_handler
  mm/page_alloc.c: fix watermark check in __zone_watermark_ok()
  drivers/misc/sgi-gru/grufile.c: fix info leak in gru_get_config_info()
  aio: fix io_destroy() regression by using call_rcu()
  rtc-at91rm9200: use shadow IMR on at91sam9x5
  rtc-at91rm9200: add shadow interrupt mask
  rtc-at91rm9200: refactor interrupt-register handling
  rtc-at91rm9200: add configuration support
  rtc-at91rm9200: add match-table compile guard
  fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directory
  swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion
  drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with device tree
  cciss: fix broken mutex usage in ioctl
  audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE
  drivers/rtc/rtc-cmos.c: fix accidentally enabling rtc channel
  ...
2013-06-12 16:29:53 -07:00
Chen Gang
736f3203a0 kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules()
audit_add_tree_rule() must set 'rule->tree = NULL;' firstly, to protect
the rule itself freed in kill_rules().

The reason is when it is killed, the 'rule' itself may have already
released, we should not access it.  one example: we add a rule to an
inode, just at the same time the other task is deleting this inode.

The work flow for adding a rule:

    audit_receive() -> (need audit_cmd_mutex lock)
      audit_receive_skb() ->
        audit_receive_msg() ->
          audit_receive_filter() ->
            audit_add_rule() ->
              audit_add_tree_rule() -> (need audit_filter_mutex lock)
                ...
                unlock audit_filter_mutex
                get_tree()
                ...
                iterate_mounts() -> (iterate all related inodes)
                  tag_mount() ->
                    tag_trunk() ->
                      create_trunk() -> (assume it is 1st rule)
                        fsnotify_add_mark() ->
                          fsnotify_add_inode_mark() ->  (add mark to inode->i_fsnotify_marks)
                        ...
                        get_tree(); (each inode will get one)
                ...
                lock audit_filter_mutex

The work flow for deleting an inode:

    __destroy_inode() ->
     fsnotify_inode_delete() ->
       __fsnotify_inode_delete() ->
        fsnotify_clear_marks_by_inode() ->  (get mark from inode->i_fsnotify_marks)
          fsnotify_destroy_mark() ->
           fsnotify_destroy_mark_locked() ->
             audit_tree_freeing_mark() ->
               evict_chunk() ->
                 ...
                 tree->goner = 1
                 ...
                 kill_rules() ->   (assume current->audit_context == NULL)
                   call_rcu() ->   (rule->tree != NULL)
                     audit_free_rule_rcu() ->
                       audit_free_rule()
                 ...
                 audit_schedule_prune() ->  (assume current->audit_context == NULL)
                   kthread_run() ->    (need audit_cmd_mutex and audit_filter_mutex lock)
                     prune_one() ->    (delete it from prue_list)
                       put_tree(); (match the original get_tree above)

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:46 -07:00
Oleg Nesterov
f000cfdde5 audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE
audit_log_start() does wait_for_auditd() in a loop until
audit_backlog_wait_time passes or audit_skb_queue has a room.

If signal_pending() is true this becomes a busy-wait loop, schedule() in
TASK_INTERRUPTIBLE won't block.

Thanks to Guy for fully investigating and explaining the problem.

(akpm: that'll cause the system to lock up on a non-preemptible
uniprocessor kernel)

(Guy: "Our customer was in fact running a uniprocessor machine, and they
reported a system hang.")

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Guy Streeter <streeter@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:45 -07:00
Kees Cook
637241a900 kmsg: honor dmesg_restrict sysctl on /dev/kmsg
The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections.  Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions.  With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.

To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:

 - /proc/kmsg allows:
  - open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
    single-reader interface (SYSLOG_ACTION_READ).
  - everything, after an open.

 - syslog syscall allows:
  - anything, if CAP_SYSLOG.
  - SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
    dmesg_restrict==0.
  - nothing else (EPERM).

The use-cases were:
 - dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
 - sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
   destructive SYSLOG_ACTION_READs.

AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.

Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.

To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.

 - /dev/kmsg allows:
  - open if CAP_SYSLOG or dmesg_restrict==0
  - reading/polling, after open

Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192

[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:44 -07:00
Robin Holt
cf7df378aa reboot: rigrate shutdown/reboot to boot cpu
We recently noticed that reboot of a 1024 cpu machine takes approx 16
minutes of just stopping the cpus.  The slowdown was tracked to commit
f96972f2dc ("kernel/sys.c: call disable_nonboot_cpus() in
kernel_restart()").

The current implementation does all the work of hot removing the cpus
before halting the system.  We are switching to just migrating to the
boot cpu and then continuing with shutdown/reboot.

This also has the effect of not breaking x86's command line parameter
for specifying the reboot cpu.  Note, this code was shamelessly copied
from arch/x86/kernel/reboot.c with bits removed pertaining to the
reboot_cpu command line parameter.

Signed-off-by: Robin Holt <holt@sgi.com>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:44 -07:00
Srivatsa S. Bhat
16e53dbf10 CPU hotplug: provide a generic helper to disable/enable CPU hotplug
There are instances in the kernel where we would like to disable CPU
hotplug (from sysfs) during some important operation.  Today the freezer
code depends on this and the code to do it was kinda tailor-made for
that.

Restructure the code and make it generic enough to be useful for other
usecases too.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12 16:29:44 -07:00
Linus Torvalds
45d53766b9 Yoshihiro Yunomae fixed a regression in the output format when using
one of the counter clocks. The new multibuffer code changed the trace_clock
 file to update the trace instances tr->clock_id but the actual traces still
 used the value from the obsolete global variable trace_clock_id.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJRt8hEAAoJEOdOSU1xswtM6+EH/jAEuOrXhkkDqcua/paAOngw
 XWSaF9Cr6ozh4hutFrVSBi3AnsDrVo0adZmMVvLS9a7goyIUdYLfXbNeyK6Nvcq5
 bGXR5sJNpjtQ7snrmGX2KlXIGix28adi49eACi4qsGSG/jJYORYlgXcNBeXtENsb
 PKTXdQ8XEyc/h7Q51YQPHIVunf+zJSoepuXZ0myPLUWzLlPX9qoy5kETEpGhh9xh
 Ianb4wLo8dn6JuVGBuXQhZ/VzUHwJT1jJxR2JfnZ0bNVplNilnumAxY8f2zPOmzT
 lvIiQjCMRvNExFShuFh9WNnGRi62zYE9e0JJpYL4W9moIcbbEUvXYt2/imAVe9Q=
 =H+Wb
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "Yoshihiro Yunomae fixed a regression in the output format when using
  one of the counter clocks.

  The new multibuffer code changed the trace_clock file to update the
  trace instances tr->clock_id but the actual traces still used the
  value from the obsolete global variable trace_clock_id"

* tag 'trace-fixes-v3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix outputting formats of x86-tsc and counter when use trace_clock
2013-06-12 08:29:11 -07:00
Yoshihiro YUNOMAE
58e8eedf18 tracing: Fix outputting formats of x86-tsc and counter when use trace_clock
Outputting formats of x86-tsc and counter should be a raw format, but after
applying the patch(2b6080f28c), the format was
changed to nanosec. This is because the global variable trace_clock_id was used.
When we use multiple buffers, clock_id of each sub-buffer should be used. Then,
this patch uses tr->clock_id instead of the global variable trace_clock_id.

[ Basically, this fixes a regression where the multibuffer code changed the
  trace_clock file to update tr->clock_id but the traces still use the old
  global trace_clock_id variable, negating the file's effect. The global
  trace_clock_id variable is obsolete and removed. - SR ]

Link: http://lkml.kernel.org/r/20130423013239.22334.7394.stgit@yunodevel

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-11 13:58:46 -04:00
Ben Greear
34376a50fb Fix lockup related to stop_machine being stuck in __do_softirq.
The stop machine logic can lock up if all but one of the migration
threads make it through the disable-irq step and the one remaining
thread gets stuck in __do_softirq.  The reason __do_softirq can hang is
that it has a bail-out based on jiffies timeout, but in the lockup case,
jiffies itself is not incremented.

To work around this, re-add the max_restart counter in __do_irq and stop
processing irqs after 10 restarts.

Thanks to Tejun Heo and Rusty Russell and others for helping me track
this down.

This was introduced in 3.9 by commit c10d73671a ("softirq: reduce
latencies").

It may be worth looking into ath9k to see if it has issues with its irq
handler at a later date.

The hang stack traces look something like this:

    ------------[ cut here ]------------
    WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7()
    Watchdog detected hard LOCKUP on cpu 2
    Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
    Pid: 23, comm: migration/2 Tainted: G         C   3.9.4+ #11
    Call Trace:
     <NMI>   warn_slowpath_common+0x85/0x9f
      warn_slowpath_fmt+0x46/0x48
      watchdog_overflow_callback+0x9c/0xa7
      __perf_event_overflow+0x137/0x1cb
      perf_event_overflow+0x14/0x16
      intel_pmu_handle_irq+0x2dc/0x359
      perf_event_nmi_handler+0x19/0x1b
      nmi_handle+0x7f/0xc2
      do_nmi+0xbc/0x304
      end_repeat_nmi+0x1e/0x2e
     <<EOE>>
      cpu_stopper_thread+0xae/0x162
      smpboot_thread_fn+0x258/0x260
      kthread+0xc7/0xcf
      ret_from_fork+0x7c/0xb0
    ---[ end trace 4947dfa9b0a4cec3 ]---
    BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17]
    Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
    irq event stamp: 835637905
    hardirqs last  enabled at (835637904): __do_softirq+0x9f/0x257
    hardirqs last disabled at (835637905): apic_timer_interrupt+0x6d/0x80
    softirqs last  enabled at (5654720): __do_softirq+0x1ff/0x257
    softirqs last disabled at (5654725): irq_exit+0x5f/0xbb
    CPU 1
    Pid: 17, comm: migration/1 Tainted: G        WC   3.9.4+ #11 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
    RIP: tasklet_hi_action+0xf0/0xf0
    Process migration/1
    Call Trace:
     <IRQ>
      __do_softirq+0x117/0x257
      irq_exit+0x5f/0xbb
      smp_apic_timer_interrupt+0x8a/0x98
      apic_timer_interrupt+0x72/0x80
     <EOI>
      printk+0x4d/0x4f
      stop_machine_cpu_stop+0x22c/0x274
      cpu_stopper_thread+0xae/0x162
      smpboot_thread_fn+0x258/0x260
      kthread+0xc7/0xcf
      ret_from_fork+0x7c/0xb0

Signed-off-by: Ben Greear <greearb@candelatech.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Pekka Riikonen <priikone@iki.fi>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-10 17:46:57 -07:00
Paul E. McKenney
be77f87c00 Merge branches 'cbnum.2013.06.10a', 'doc.2013.06.10a', 'fixes.2013.06.10a', 'srcu.2013.06.10a' and 'tiny.2013.06.10a' into HEAD
cbnum.2013.06.10a: Apply simplifications stemming from the new callback
	numbering.

doc.2013.06.10a: Documentation updates.

fixes.2013.06.10a: Miscellaneous fixes.

srcu.2013.06.10a: Updates to SRCU.

tiny.2013.06.10a: Eliminate TINY_PREEMPT_RCU.
2013-06-10 13:46:44 -07:00
Paul E. McKenney
1496144469 rcu: Shrink TINY_RCU by reworking CPU-stall ifdefs
TINY_RCU's reset_cpu_stall_ticks() and check_cpu_stalls() functions
are defined unconditionally, and are empty functions if CONFIG_RCU_TRACE
is disabled (which in turns disables detection of RCU CPU stalls).
This commit saves a few lines of source code by defining these functions
only if CONFIG_RCU_TRACE=y.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-06-10 13:45:53 -07:00
Paul E. McKenney
2439b696cb rcu: Shrink TINY_RCU by moving exit_rcu()
Now that TINY_PREEMPT_RCU is no more, exit_rcu() is always an empty
function.  But if TINY_RCU is going to have an empty function, it should
be in include/linux/rcutiny.h, where it does not bloat the kernel.
This commit therefore moves exit_rcu() out of kernel/rcupdate.c to
kernel/rcutree_plugin.h, and places a static inline empty function in
include/linux/rcutiny.h in order to shrink TINY_RCU a bit.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:45:52 -07:00
Paul E. McKenney
318bdcd959 rcu: Consolidate rcutiny_plugin.h ifdefs
This commit rearranges code in order to allow ifdefs to be consolidated
in kernel/rcutiny_plugin.h, simplifying the code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:45:52 -07:00
Paul E. McKenney
4879c84daa rcu: Remove check_cpu_stall_preempt()
With the removal of CONFIG_TINY_PREEMPT_RCU, check_cpu_stall_preempt()
is now an empty function.  This commit therefore eliminates it by
inlining it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:45:51 -07:00
Paul E. McKenney
9dc5ad3248 rcu: Simplify RCU_TINY RCU callback invocation
TINY_PREEMPT_RCU could use a kthread to handle RCU callback invocation,
which required an API to abstract kthread vs. softirq invocation.
Now that TINY_PREEMPT_RCU is no longer with us, this commit retires
this API in favor of direct use of the relevant softirq primitives.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:45:51 -07:00
Paul E. McKenney
58c4e69d43 rcu: Remove rcu_preempt_process_callbacks()
With the removal of CONFIG_TINY_PREEMPT_RCU, rcu_preempt_process_callbacks()
is now an empty function.  This commit therefore eliminates it by
inlining it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:45:50 -07:00
Paul E. McKenney
47d65935a7 rcu: Remove rcu_preempt_remove_callbacks()
With the removal of CONFIG_TINY_PREEMPT_RCU, rcu_preempt_remove_callbacks()
is now an empty function.  This commit therefore eliminates it by
inlining it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:45:50 -07:00
Paul E. McKenney
9acaac8ced rcu: Remove rcu_preempt_check_callbacks()
With the removal of CONFIG_TINY_PREEMPT_RCU, rcu_preempt_check_callbacks()
is now an empty function.  This commit therefore eliminates it by
inlining it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:45:50 -07:00
Paul E. McKenney
221304e95e rcu: Remove show_tiny_preempt_stats()
With the removal of CONFIG_TINY_PREEMPT_RCU, show_tiny_preempt_stats()
is now an empty function.  This commit therefore eliminates it by
inlining it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:45:49 -07:00
Paul E. McKenney
127781d1ba rcu: Remove TINY_PREEMPT_RCU
TINY_PREEMPT_RCU adds significant code and complexity, but does not
offer commensurate benefits.  People currently using TINY_PREEMPT_RCU
can get much better memory footprint with TINY_RCU, or, if they really
need preemptible RCU, they can use TREE_PREEMPT_RCU with a relatively
minor degradation in memory footprint.  Please note that this move
has been widely publicized on LKML (https://lkml.org/lkml/2012/11/12/545)
and on LWN (http://lwn.net/Articles/541037/).

This commit therefore removes TINY_PREEMPT_RCU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Updated to eliminate #else in rcutiny.h as suggested by Josh ]
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:45:49 -07:00
Paul E. McKenney
99f88919f8 rcu: Remove srcu_read_lock_raw() and srcu_read_unlock_raw().
These interfaces never did get used, so this commit removes them,
their rcutorture tests, and documentation referencing them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:45:25 -07:00
Paul E. McKenney
4982969d96 rcu: Merge adjacent identical ifdefs
Two ifdefs in kernel/rcupdate.c now have identical conditions with
nothing between them, so the commit merges them into a single ifdef.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:44:56 -07:00
Paul E. McKenney
026ad2835c rcu: Drive quiescent-state-forcing delay from HZ
Systems with HZ=100 can have slow bootup times due to the default
three-jiffy delays between quiescent-state forcing attempts.  This
commit therefore auto-tunes the RCU_JIFFIES_TILL_FORCE_QS value based
on the value of HZ.  However, this would break very large systems that
require more time between quiescent-state forcing attempts.  This
commit therefore also ups the default delay by one jiffy for each
256 CPUs that might be on the system (based off of nr_cpu_ids at
runtime, -not- NR_CPUS at build time).

Updated to collapse #ifdefs for RCU_JIFFIES_TILL_FORCE_QS into a
step-function definition as suggested by Josh Triplett.

Reported-by: Paul Mackerras <paulus@au1.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-06-10 13:44:56 -07:00
Paul E. McKenney
9a5739d73f rcu: Remove "Experimental" flags
After a release or two, features are no longer experimental.  Therefore,
this commit removes the "Experimental" tag from them.

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:44:56 -07:00
Paul E. McKenney
05eb552bf5 rcu: Move redundant call to note_gp_changes() into called function
The __rcu_process_callbacks() invokes note_gp_changes() immediately
before invoking rcu_check_quiescent_state(), which conditionally
invokes that same function.  This commit therefore eliminates the
call to note_gp_changes() in __rcu_process_callbacks() in favor of
making unconditional to call from rcu_check_quiescent_state() to
note_gp_changes().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:39:44 -07:00
Paul E. McKenney
ce3d9c03d1 rcu: Inline trivial wrapper function rcu_start_gp_per_cpu()
Given the changes that introduce note_gp_change(), rcu_start_gp_per_cpu()
is now a trivial wrapper function with only one caller.  This commit
therefore inlines it into its sole call site.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:39:44 -07:00
Paul E. McKenney
63274cfb94 rcu: Eliminate check_for_new_grace_period() wrapper function
One of the calls to check_for_new_grace_period() is now redundant due to
an immediately preceding call to note_gp_changes().  Eliminating this
redundant call leaves a single caller, which is simpler if inlined.
This commit therefore eliminates the redundant call and inlines the
body of check_for_new_grace_period() into the single remaining call site.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:39:44 -07:00
Paul E. McKenney
ba9fbe955f rcu: Merge __rcu_process_gp_end() into __note_gp_changes()
This commit eliminates some duplicated code by merging
__rcu_process_gp_end() into __note_gp_changes().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:39:43 -07:00
Paul E. McKenney
470716fc04 rcu: Switch callers from rcu_process_gp_end() to note_gp_changes()
Because note_gp_changes() now incorporates rcu_process_gp_end() function,
this commit switches to the former and eliminates the latter.  In
addition, this commit changes external calls from __rcu_process_gp_end()
to __note_gp_changes().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:39:43 -07:00
Paul E. McKenney
efc151c33b rcu: Convert rcutree_plugin.h printk calls
This commit converts printk() calls to the corresponding pr_*() calls.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:39:42 -07:00
Paul E. McKenney
d34ea3221a rcu: Rename note_new_gpnum() to note_gp_changes()
Because note_new_gpnum() now also checks for the ends of old grace periods,
this commit changes its name to note_gp_changes().  Later commits will merge
rcu_process_gp_end() into note_gp_changes().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:39:42 -07:00
Paul E. McKenney
398ebe6000 rcu: Make __note_new_gpnum() check for ends of prior grace periods
The current implementation can detect the beginning of a new grace period
before noting the end of a previous grace period.  Although the current
implementation correctly handles this sort of nonsense, it would be
good to reduce RCU's state space by making such nonsense unnecessary,
which is now possible thanks to the fact that RCU's callback groups are
now numbered.

This commit therefore makes __note_new_gpnum() invoke
__rcu_process_gp_end() in order to note the ends of prior grace
periods before noting the beginnings of new grace periods.
Of course, this now means that note_new_gpnum() notes both the
beginnings and ends of grace periods, and could therefore be
used in place of rcu_process_gp_end().  But that is a job for
later commits.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:39:42 -07:00
Paul E. McKenney
6eaef633d7 rcu: Move code to apply callback-numbering simplifications
The addition of callback numbering allows combining the detection of the
ends of old grace periods and the beginnings of new grace periods.  This
commit moves code to set the stage for this combining.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:39:42 -07:00
Paul E. McKenney
d7f3e20739 rcu: Convert rcutree.c printk calls
This commit converts printk() calls to the corresponding pr_*() calls.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-06-10 13:39:41 -07:00
Paul E. McKenney
971394f389 rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration
In Steven Rostedt's words:

> I've been debugging the last couple of days why my tests have been
> locking up. One of my tracing tests, runs all available tracers. The
> lockup always happened with the mmiotrace, which is used to trace
> interactions between priority drivers and the kernel. But to do this
> easily, when the tracer gets registered, it disables all but the boot
> CPUs. The lockup always happened after it got done disabling the CPUs.
>
> Then I decided to try this:
>
> while :; do
> 	for i in 1 2 3; do
> 		echo 0 > /sys/devices/system/cpu/cpu$i/online
> 	done
> 	for i in 1 2 3; do
> 		echo 1 > /sys/devices/system/cpu/cpu$i/online
> 	done
> done
>
> Well, sure enough, that locked up too, with the same users. Doing a
> sysrq-w (showing all blocked tasks):
>
> [ 2991.344562]   task                        PC stack   pid father
> [ 2991.344562] rcu_preempt     D ffff88007986fdf8     0    10      2 0x00000000
> [ 2991.344562]  ffff88007986fc98 0000000000000002 ffff88007986fc48 0000000000000908
> [ 2991.344562]  ffff88007986c280 ffff88007986ffd8 ffff88007986ffd8 00000000001d3c80
> [ 2991.344562]  ffff880079248a40 ffff88007986c280 0000000000000000 00000000fffd4295
> [ 2991.344562] Call Trace:
> [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
> [ 2991.344562]  [<ffffffff81541750>] schedule_timeout+0xbc/0xf9
> [ 2991.344562]  [<ffffffff8154bec0>] ? ftrace_call+0x5/0x2f
> [ 2991.344562]  [<ffffffff81049513>] ? cascade+0xa8/0xa8
> [ 2991.344562]  [<ffffffff815417ab>] schedule_timeout_uninterruptible+0x1e/0x20
> [ 2991.344562]  [<ffffffff810c980c>] rcu_gp_kthread+0x502/0x94b
> [ 2991.344562]  [<ffffffff81062791>] ? __init_waitqueue_head+0x50/0x50
> [ 2991.344562]  [<ffffffff810c930a>] ? rcu_gp_fqs+0x64/0x64
> [ 2991.344562]  [<ffffffff81061cdb>] kthread+0xb1/0xb9
> [ 2991.344562]  [<ffffffff81091e31>] ? lock_release_holdtime.part.23+0x4e/0x55
> [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562]  [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0
> [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562] kworker/0:1     D ffffffff81a30680     0    47      2 0x00000000
> [ 2991.344562] Workqueue: events cpuset_hotplug_workfn
> [ 2991.344562]  ffff880078dbbb58 0000000000000002 0000000000000006 00000000000000d8
> [ 2991.344562]  ffff880078db8100 ffff880078dbbfd8 ffff880078dbbfd8 00000000001d3c80
> [ 2991.344562]  ffff8800779ca5c0 ffff880078db8100 ffffffff81541fcf 0000000000000000
> [ 2991.344562] Call Trace:
> [ 2991.344562]  [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609
> [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
> [ 2991.344562]  [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24
> [ 2991.344562]  [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609
> [ 2991.344562]  [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50
> [ 2991.344562]  [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50
> [ 2991.344562]  [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40
> [ 2991.344562]  [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50
> [ 2991.344562]  [<ffffffff810af7e6>] rebuild_sched_domains_locked+0x6e/0x3a8
> [ 2991.344562]  [<ffffffff810b0ec6>] rebuild_sched_domains+0x1c/0x2a
> [ 2991.344562]  [<ffffffff810b109b>] cpuset_hotplug_workfn+0x1c7/0x1d3
> [ 2991.344562]  [<ffffffff810b0ed9>] ? cpuset_hotplug_workfn+0x5/0x1d3
> [ 2991.344562]  [<ffffffff81058e07>] process_one_work+0x2d4/0x4d1
> [ 2991.344562]  [<ffffffff81058d3a>] ? process_one_work+0x207/0x4d1
> [ 2991.344562]  [<ffffffff8105964c>] worker_thread+0x2e7/0x3b5
> [ 2991.344562]  [<ffffffff81059365>] ? rescuer_thread+0x332/0x332
> [ 2991.344562]  [<ffffffff81061cdb>] kthread+0xb1/0xb9
> [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562]  [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0
> [ 2991.344562]  [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58
> [ 2991.344562] bash            D ffffffff81a4aa80     0  2618   2612 0x10000000
> [ 2991.344562]  ffff8800379abb58 0000000000000002 0000000000000006 0000000000000c2c
> [ 2991.344562]  ffff880077fea140 ffff8800379abfd8 ffff8800379abfd8 00000000001d3c80
> [ 2991.344562]  ffff8800779ca5c0 ffff880077fea140 ffffffff81541fcf 0000000000000000
> [ 2991.344562] Call Trace:
> [ 2991.344562]  [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609
> [ 2991.344562]  [<ffffffff815437ba>] schedule+0x64/0x66
> [ 2991.344562]  [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24
> [ 2991.344562]  [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609
> [ 2991.344562]  [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e
> [ 2991.344562]  [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e
> [ 2991.344562]  [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40
> [ 2991.344562]  [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e
> [ 2991.344562]  [<ffffffff81091c99>] ? __lock_is_held+0x32/0x53
> [ 2991.344562]  [<ffffffff81548912>] notifier_call_chain+0x6b/0x98
> [ 2991.344562]  [<ffffffff810671fd>] __raw_notifier_call_chain+0xe/0x10
> [ 2991.344562]  [<ffffffff8103cf64>] __cpu_notify+0x20/0x32
> [ 2991.344562]  [<ffffffff8103cf8d>] cpu_notify_nofail+0x17/0x36
> [ 2991.344562]  [<ffffffff815225de>] _cpu_down+0x154/0x259
> [ 2991.344562]  [<ffffffff81522710>] cpu_down+0x2d/0x3a
> [ 2991.344562]  [<ffffffff81526351>] store_online+0x4e/0xe7
> [ 2991.344562]  [<ffffffff8134d764>] dev_attr_store+0x20/0x22
> [ 2991.344562]  [<ffffffff811b3c5f>] sysfs_write_file+0x108/0x144
> [ 2991.344562]  [<ffffffff8114c5ef>] vfs_write+0xfd/0x158
> [ 2991.344562]  [<ffffffff8114c928>] SyS_write+0x5c/0x83
> [ 2991.344562]  [<ffffffff8154c494>] tracesys+0xdd/0xe2
>
> As well as held locks:
>
> [ 3034.728033] Showing all locks held in the system:
> [ 3034.728033] 1 lock held by rcu_preempt/10:
> [ 3034.728033]  #0:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff810c9471>] rcu_gp_kthread+0x167/0x94b
> [ 3034.728033] 4 locks held by kworker/0:1/47:
> [ 3034.728033]  #0:  (events){.+.+.+}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1
> [ 3034.728033]  #1:  (cpuset_hotplug_work){+.+.+.}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1
> [ 3034.728033]  #2:  (cpuset_mutex){+.+.+.}, at: [<ffffffff810b0ec1>] rebuild_sched_domains+0x17/0x2a
> [ 3034.728033]  #3:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50
> [ 3034.728033] 1 lock held by mingetty/2563:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2565:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2569:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2572:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 1 lock held by mingetty/2575:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
> [ 3034.728033] 7 locks held by bash/2618:
> [ 3034.728033]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff8114bc3f>] file_start_write+0x2a/0x2c
> [ 3034.728033]  #1:  (&buffer->mutex#2){+.+.+.}, at: [<ffffffff811b3b93>] sysfs_write_file+0x3c/0x144
> [ 3034.728033]  #2:  (s_active#54){.+.+.+}, at: [<ffffffff811b3c3e>] sysfs_write_file+0xe7/0x144
> [ 3034.728033]  #3:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff810217c2>] cpu_hotplug_driver_lock+0x17/0x19
> [ 3034.728033]  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8103d196>] cpu_maps_update_begin+0x17/0x19
> [ 3034.728033]  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103cfd8>] cpu_hotplug_begin+0x2c/0x6d
> [ 3034.728033]  #6:  (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e
> [ 3034.728033] 1 lock held by bash/2980:
> [ 3034.728033]  #0:  (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8
>
> Things looked a little weird. Also, this is a deadlock that lockdep did
> not catch. But what we have here does not look like a circular lock
> issue:
>
> Bash is blocked in rcu_cpu_notify():
>
> 1961		/* Exclude any attempts to start a new grace period. */
> 1962		mutex_lock(&rsp->onoff_mutex);
>
>
> kworker is blocked in get_online_cpus(), which makes sense as we are
> currently taking down a CPU.
>
> But rcu_preempt is not blocked on anything. It is simply sleeping in
> rcu_gp_kthread (really rcu_gp_init) here:
>
> 1453	#ifdef CONFIG_PROVE_RCU_DELAY
> 1454			if ((prandom_u32() % (rcu_num_nodes * 8)) == 0 &&
> 1455			    system_state == SYSTEM_RUNNING)
> 1456				schedule_timeout_uninterruptible(2);
> 1457	#endif /* #ifdef CONFIG_PROVE_RCU_DELAY */
>
> And it does this while holding the onoff_mutex that bash is waiting for.
>
> Doing a function trace, it showed me where it happened:
>
> [  125.940066] rcu_pree-10      3.... 28384115273: schedule_timeout_uninterruptible <-rcu_gp_kthread
> [...]
> [  125.940066] rcu_pree-10      3d..3 28384202439: sched_switch: prev_comm=rcu_preempt prev_pid=10 prev_prio=120 prev_state=D ==> next_comm=watchdog/3 next_pid=38 next_prio=120
>
> The watchdog ran, and then:
>
> [  125.940066] watchdog-38      3d..3 28384692863: sched_switch: prev_comm=watchdog/3 prev_pid=38 prev_prio=120 prev_state=P ==> next_comm=modprobe next_pid=2848 next_prio=118
>
> Not sure what modprobe was doing, but shortly after that:
>
> [  125.940066] modprobe-2848    3d..3 28385041749: sched_switch: prev_comm=modprobe prev_pid=2848 prev_prio=118 prev_state=R+ ==> next_comm=migration/3 next_pid=40 next_prio=0
>
> Where the migration thread took down the CPU:
>
> [  125.940066] migratio-40      3d..3 28389148276: sched_switch: prev_comm=migration/3 prev_pid=40 prev_prio=0 prev_state=P ==> next_comm=swapper/3 next_pid=0 next_prio=120
>
> which finally did:
>
> [  125.940066]   <idle>-0       3...1 28389282142: arch_cpu_idle_dead <-cpu_startup_entry
> [  125.940066]   <idle>-0       3...1 28389282548: native_play_dead <-arch_cpu_idle_dead
> [  125.940066]   <idle>-0       3...1 28389282924: play_dead_common <-native_play_dead
> [  125.940066]   <idle>-0       3...1 28389283468: idle_task_exit <-play_dead_common
> [  125.940066]   <idle>-0       3...1 28389284644: amd_e400_remove_cpu <-play_dead_common
>
>
> CPU 3 is now offline, the rcu_preempt thread that ran on CPU 3 is still
> doing a schedule_timeout_uninterruptible() and it registered it's
> timeout to the timer base for CPU 3. You would think that it would get
> migrated right? The issue here is that the timer migration happens at
> the CPU notifier for CPU_DEAD. The problem is that the rcu notifier for
> CPU_DOWN is blocked waiting for the onoff_mutex to be released, which is
> held by the thread that just put itself into a uninterruptible sleep,
> that wont wake up until the CPU_DEAD notifier of the timer
> infrastructure is called, which wont happen until the rcu notifier
> finishes. Here's our deadlock!

This commit breaks this deadlock cycle by substituting a shorter udelay()
for the previous schedule_timeout_uninterruptible(), while at the same
time increasing the probability of the delay.  This maintains the intensity
of the testing.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-10 13:37:12 -07:00
Steven Rostedt
016a8d5be6 rcu: Don't call wakeup() with rcu_node structure ->lock held
This commit fixes a lockdep-detected deadlock by moving a wake_up()
call out from a rnp->lock critical section.  Please see below for
the long version of this story.

On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:

> [12572.705832] ======================================================
> [12572.750317] [ INFO: possible circular locking dependency detected ]
> [12572.796978] 3.10.0-rc3+ #39 Not tainted
> [12572.833381] -------------------------------------------------------
> [12572.862233] trinity-child17/31341 is trying to acquire lock:
> [12572.870390]  (rcu_node_0){..-.-.}, at: [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12572.878859]
> but task is already holding lock:
> [12572.894894]  (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
> [12572.903381]
> which lock already depends on the new lock.
>
> [12572.927541]
> the existing dependency chain (in reverse order) is:
> [12572.943736]
> -> #4 (&ctx->lock){-.-...}:
> [12572.960032]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12572.968337]        [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12572.976633]        [<ffffffff8113c987>] __perf_event_task_sched_out+0x2e7/0x5e0
> [12572.984969]        [<ffffffff81088953>] perf_event_task_sched_out+0x93/0xa0
> [12572.993326]        [<ffffffff816ea0bf>] __schedule+0x2cf/0x9c0
> [12573.001652]        [<ffffffff816eacfe>] schedule_user+0x2e/0x70
> [12573.009998]        [<ffffffff816ecd64>] retint_careful+0x12/0x2e
> [12573.018321]
> -> #3 (&rq->lock){-.-.-.}:
> [12573.034628]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.042930]        [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.051248]        [<ffffffff8108e6a7>] wake_up_new_task+0xb7/0x260
> [12573.059579]        [<ffffffff810492f5>] do_fork+0x105/0x470
> [12573.067880]        [<ffffffff81049686>] kernel_thread+0x26/0x30
> [12573.076202]        [<ffffffff816cee63>] rest_init+0x23/0x140
> [12573.084508]        [<ffffffff81ed8e1f>] start_kernel+0x3f1/0x3fe
> [12573.092852]        [<ffffffff81ed856f>] x86_64_start_reservations+0x2a/0x2c
> [12573.101233]        [<ffffffff81ed863d>] x86_64_start_kernel+0xcc/0xcf
> [12573.109528]
> -> #2 (&p->pi_lock){-.-.-.}:
> [12573.125675]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.133829]        [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.141964]        [<ffffffff8108e881>] try_to_wake_up+0x31/0x320
> [12573.150065]        [<ffffffff8108ebe2>] default_wake_function+0x12/0x20
> [12573.158151]        [<ffffffff8107bbf8>] autoremove_wake_function+0x18/0x40
> [12573.166195]        [<ffffffff81085398>] __wake_up_common+0x58/0x90
> [12573.174215]        [<ffffffff81086909>] __wake_up+0x39/0x50
> [12573.182146]        [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.190119]        [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
> [12573.198023]        [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
> [12573.205860]        [<ffffffff8107a91d>] kthread+0xed/0x100
> [12573.213656]        [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
> [12573.221379]
> -> #1 (&rsp->gp_wq){..-.-.}:
> [12573.236329]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.243783]        [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.251178]        [<ffffffff810868f3>] __wake_up+0x23/0x50
> [12573.258505]        [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.265891]        [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
> [12573.273248]        [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
> [12573.280564]        [<ffffffff8107a91d>] kthread+0xed/0x100
> [12573.287807]        [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0

Notice the above call chain.

rcu_start_future_gp() is called with the rnp->lock held. Then it calls
rcu_start_gp_advance, which does a wakeup.

You can't do wakeups while holding the rnp->lock, as that would mean
that you could not do a rcu_read_unlock() while holding the rq lock, or
any lock that was taken while holding the rq lock. This is because...
(See below).

> [12573.295067]
> -> #0 (rcu_node_0){..-.-.}:
> [12573.309293]        [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
> [12573.316568]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.323825]        [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.331081]        [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12573.338377]        [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
> [12573.345648]        [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
> [12573.352942]        [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
> [12573.360211]        [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
> [12573.367514]        [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
> [12573.374816]        [<ffffffff816f4dd4>] tracesys+0xdd/0xe2

Notice the above trace.

perf took its own ctx->lock, which can be taken while holding the rq
lock. While holding this lock, it did a rcu_read_unlock(). The
perf_lock_task_context() basically looks like:

rcu_read_lock();
raw_spin_lock(ctx->lock);
rcu_read_unlock();

Now, what looks to have happened, is that we scheduled after taking that
first rcu_read_lock() but before taking the spin lock. When we scheduled
back in and took the ctx->lock, the following rcu_read_unlock()
triggered the "special" code.

The rcu_read_unlock_special() takes the rnp->lock, which gives us a
possible deadlock scenario.

	CPU0		CPU1		CPU2
	----		----		----

				     rcu_nocb_kthread()
    lock(rq->lock);
		    lock(ctx->lock);
				     lock(rnp->lock);

				     wake_up();

				     lock(rq->lock);

		    rcu_read_unlock();

		    rcu_read_unlock_special();

		    lock(rnp->lock);
    lock(ctx->lock);

**** DEADLOCK ****

> [12573.382068]
> other info that might help us debug this:
>
> [12573.403229] Chain exists of:
>   rcu_node_0 --> &rq->lock --> &ctx->lock
>
> [12573.424471]  Possible unsafe locking scenario:
>
> [12573.438499]        CPU0                    CPU1
> [12573.445599]        ----                    ----
> [12573.452691]   lock(&ctx->lock);
> [12573.459799]                                lock(&rq->lock);
> [12573.467010]                                lock(&ctx->lock);
> [12573.474192]   lock(rcu_node_0);
> [12573.481262]
>  *** DEADLOCK ***
>
> [12573.501931] 1 lock held by trinity-child17/31341:
> [12573.508990]  #0:  (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
> [12573.516475]
> stack backtrace:
> [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
> [12573.545357]  ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
> [12573.552868]  ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
> [12573.560353]  0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
> [12573.567856] Call Trace:
> [12573.575011]  [<ffffffff816e375b>] dump_stack+0x19/0x1b
> [12573.582284]  [<ffffffff816dfa5d>] print_circular_bug+0x200/0x20f
> [12573.589637]  [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
> [12573.596982]  [<ffffffff810918f5>] ? sched_clock_cpu+0xb5/0x100
> [12573.604344]  [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.611652]  [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.619030]  [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.626331]  [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.633671]  [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12573.640992]  [<ffffffff811390ed>] ? perf_lock_task_context+0x7d/0x2d0
> [12573.648330]  [<ffffffff810b429e>] ? put_lock_stats.isra.29+0xe/0x40
> [12573.655662]  [<ffffffff813095a0>] ? delay_tsc+0x90/0xe0
> [12573.662964]  [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
> [12573.670276]  [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
> [12573.677622]  [<ffffffff81139070>] ? __perf_event_enable+0x370/0x370
> [12573.684981]  [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
> [12573.692358]  [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
> [12573.699753]  [<ffffffff8108cd9d>] ? get_parent_ip+0xd/0x50
> [12573.707135]  [<ffffffff810b71fd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [12573.714599]  [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
> [12573.721996]  [<ffffffff816f4dd4>] tracesys+0xdd/0xe2

This commit delays the wakeup via irq_work(), which is what
perf and ftrace use to perform wakeups in critical sections.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-06-10 13:37:11 -07:00
Linus Torvalds
81db4dbf59 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:

 - Trivial: unused variable removal

 - Posix-timers: Add the clock ID to the new proc interface to make it
   useful.  The interface is new and should be functional when we reach
   the final 3.10 release.

 - Cure a false positive warning in the tick code introduced by the
   overhaul in 3.10

 - Fix for a persistent clock detection regression introduced in this
   cycle

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Correct run-time detection of persistent_clock.
  ntp: Remove unused variable flags in __hardpps
  posix-timers: Show clock ID in proc file
  tick: Cure broadcast false positive pending bit warning
2013-06-08 15:51:21 -07:00
Linus Torvalds
c3e58a7945 irqdomain bug fixes for v3.10-rc4
This branch contains a set of straight forward bug fixes to the
 irqdomain code and to a couple of drivers that make use of it.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRs51nAAoJEEFnBt12D9kBmnUP/RCgaTn5biRD0tC6OCGwvsZr
 YKypc71cZtbO3CTk1Sw2jgDUoW+2FWwtbwKWCmrHIaulRuxoeMHLbpc6fEGFRAjG
 ENnSEuSJkYV2T5ZoYjM5mAjotHUcszxZ9uOz7ovCUY72GO/+tfJ97NT9+CCpPfWV
 Wa/i08/91UPbWP1ASfMLXVzqO9uqEYvrrvY2PSqJ/g0BkzbybAg38u6IycZkGW4u
 /mjglx5fYRhcQgl7o1FDaw97AGjbykt2mgP7EK3R24BxvEy4gmn4IzGo9duOf7Y2
 b1tEfro/keRoibuKehPWdKTvpda80DUJjrsOwmNveZHTWlSB8GZXqCEmOmTHngrV
 gNX6MUVZClUvKiQCDo3ibyZUmIuUnnlRee6WqQzr2VsMiwct449Gg81zwXX+Yn7O
 5KOnlyicJur3f4HqQSKEA2CXU6RRCmk2iqCFMqtutxy20cmm3LoW7OM7rFF7tzix
 g6czKZiX+yKwoP2E2EQ2mYM8cirKeEyPhs4EUnKJJOVVZqOCtHkrKnkbSoithsS3
 we6Isj8KM8NQ3fgeFsbcxV+ezK3moIzD0fYr3Q6x25VZLYrYH7XpUix0nlGYxCOK
 vlEpCaMes/IG/+SKElf8fPoxs0qlOYPvYZBrLjUGCG/VB01bNsj0mjKYm1va+f6v
 n3zQbGS7X+TiiHQ+EFL0
 =wZCk
 -----END PGP SIGNATURE-----

Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux

Pull irqdomain bug fixes from Grant Likely:
 "This branch contains a set of straight forward bug fixes to the
  irqdomain code and to a couple of drivers that make use of it."

* tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux:
  irqchip: Return -EPERM for reserved IRQs
  irqdomain: document the simple domain first_irq
  kernel/irq/irqdomain.c: before use 'irq_data', need check it whether valid.
  irqdomain: export irq_domain_add_simple
2013-06-08 15:50:42 -07:00
Linus Walleij
94a63da0ac irqdomain: document the simple domain first_irq
The first_irq needs to be zero to get a linear domain and that
comes with special semantics. We want to simplify this going
forward but some documentation never hurts.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
2013-06-08 21:15:09 +01:00
Chen Gang
275e31b10c kernel/irq/irqdomain.c: before use 'irq_data', need check it whether valid.
Since irq_data may be NULL, if so, we WARN_ON(), and continue, 'hwirq'
which related with 'irq_data' has to initialize later, or it will cause
issue.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
2013-06-08 21:15:09 +01:00
Arnd Bergmann
346dbb79ea irqdomain: export irq_domain_add_simple
All other irq_domain_add_* functions are exported already, and apparently
this one got left out by mistake, which causes build errors for ARM
allmodconfig kernels:

ERROR: "irq_domain_add_simple" [drivers/gpio/gpio-rcar.ko] undefined!
ERROR: "irq_domain_add_simple" [drivers/gpio/gpio-em.ko] undefined!

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
2013-06-08 21:15:08 +01:00
Linus Torvalds
14d0ee0517 This contains 4 fixes.
The first two fix the case where full RCU debugging is enabled, enabling
 function tracing causes a live lock of the system. This is due to the added
 debug checks in rcu_dereference_raw() that is used by the function tracer.
 These checks are also traced by the function tracer as well as cause enough
 overhead to the function tracer to slow down the system enough that
 the time to finish an interrupt can take longer than when the next
 interrupt is triggered, causing a live lock from the timer interrupt.
 
 Talking this over with Paul McKenney, we came up with a fix that adds
 a new rcu_dereference_raw_notrace() that does not perform these added checks,
 and let the function tracer use that.
 
 The third commit fixes a failed compile when branch tracing is enabled,
 due to the conversion of the trace_test_buffer() selftest that the
 branch trace wasn't converted for.
 
 The forth patch fixes a bug caught by the RCU lockdep code where a
 rcu_read_lock() is performed when rcu is disabled (either going to
 or from idle, or user space). This happened on the irqsoff tracer
 as it calls task_uid(). The fix here was to use current_uid() when
 possible that doesn't use rcu locking. Which luckily, is always used
 when irqsoff calls this code.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJRsQZhAAoJEOdOSU1xswtMquIH/0zyrqrLTnkc5MsNnnJ8kH5R
 z1cULts4FqBTUNZ1hdb3BTOu4zywjREIkWfM9qqpBmq9Mq6PBxX7gxWTqYvD4jiX
 EatiiCKa7Fyddx4iHJNfvtWgKVYt9WKSNeloRugS9h7NxIZ1wpz21DUpENFQzW2f
 jWRnq/AKXFmZ0vn1953mPePtRsg61RYpb7DCkTE1gtUnvL43wMd/Mo6p6BLMEG26
 1dDK6EWO/uewl8A4oP5JZYP+AP5Ckd4x1PuQK682AtQw+8S6etaGfeJr0WZmKQoD
 0aDZ/NXXSNKChlUFGJusBNJCWryONToa+sdiKuk1h/lW/k9Mail/FChiHBzMiwk=
 =uvlD
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v3.10-rc3-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "This contains 4 fixes.

  The first two fix the case where full RCU debugging is enabled,
  enabling function tracing causes a live lock of the system.  This is
  due to the added debug checks in rcu_dereference_raw() that is used by
  the function tracer.  These checks are also traced by the function
  tracer as well as cause enough overhead to the function tracer to slow
  down the system enough that the time to finish an interrupt can take
  longer than when the next interrupt is triggered, causing a live lock
  from the timer interrupt.

  Talking this over with Paul McKenney, we came up with a fix that adds
  a new rcu_dereference_raw_notrace() that does not perform these added
  checks, and let the function tracer use that.

  The third commit fixes a failed compile when branch tracing is
  enabled, due to the conversion of the trace_test_buffer() selftest
  that the branch trace wasn't converted for.

  The forth patch fixes a bug caught by the RCU lockdep code where a
  rcu_read_lock() is performed when rcu is disabled (either going to or
  from idle, or user space).  This happened on the irqsoff tracer as it
  calls task_uid().  The fix here was to use current_uid() when possible
  that doesn't use rcu locking.  Which luckily, is always used when
  irqsoff calls this code."

* tag 'trace-fixes-v3.10-rc3-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Use current_uid() for critical time tracing
  tracing: Fix bad parameter passed in branch selftest
  ftrace: Use the rcu _notrace variants for rcu_dereference_raw() and friends
  rcu: Add _notrace variation of rcu_dereference_raw() and hlist_for_each_entry_rcu()
2013-06-07 18:46:51 -07:00
Steven Rostedt (Red Hat)
f17a519485 tracing: Use current_uid() for critical time tracing
The irqsoff tracer records the max time that interrupts are disabled.
There are hooks in the assembly code that calls back into the tracer when
interrupts are disabled or enabled.

When they are enabled, the tracer checks if the amount of time they
were disabled is larger than the previous recorded max interrupts off
time. If it is, it creates a snapshot of the currently running trace
to store where the last largest interrupts off time was held and how
it happened.

During testing, this RCU lockdep dump appeared:

[ 1257.829021] ===============================
[ 1257.829021] [ INFO: suspicious RCU usage. ]
[ 1257.829021] 3.10.0-rc1-test+ #171 Tainted: G        W
[ 1257.829021] -------------------------------
[ 1257.829021] /home/rostedt/work/git/linux-trace.git/include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
[ 1257.829021]
[ 1257.829021] other info that might help us debug this:
[ 1257.829021]
[ 1257.829021]
[ 1257.829021] RCU used illegally from idle CPU!
[ 1257.829021] rcu_scheduler_active = 1, debug_locks = 0
[ 1257.829021] RCU used illegally from extended quiescent state!
[ 1257.829021] 2 locks held by trace-cmd/4831:
[ 1257.829021]  #0:  (max_trace_lock){......}, at: [<ffffffff810e2b77>] stop_critical_timing+0x1a3/0x209
[ 1257.829021]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff810dae5a>] __update_max_tr+0x88/0x1ee
[ 1257.829021]
[ 1257.829021] stack backtrace:
[ 1257.829021] CPU: 3 PID: 4831 Comm: trace-cmd Tainted: G        W    3.10.0-rc1-test+ #171
[ 1257.829021] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
[ 1257.829021]  0000000000000001 ffff880065f49da8 ffffffff8153dd2b ffff880065f49dd8
[ 1257.829021]  ffffffff81092a00 ffff88006bd78680 ffff88007add7500 0000000000000003
[ 1257.829021]  ffff88006bd78680 ffff880065f49e18 ffffffff810daebf ffffffff810dae5a
[ 1257.829021] Call Trace:
[ 1257.829021]  [<ffffffff8153dd2b>] dump_stack+0x19/0x1b
[ 1257.829021]  [<ffffffff81092a00>] lockdep_rcu_suspicious+0x109/0x112
[ 1257.829021]  [<ffffffff810daebf>] __update_max_tr+0xed/0x1ee
[ 1257.829021]  [<ffffffff810dae5a>] ? __update_max_tr+0x88/0x1ee
[ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
[ 1257.829021]  [<ffffffff810dbf85>] update_max_tr_single+0x11d/0x12d
[ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
[ 1257.829021]  [<ffffffff810e2b15>] stop_critical_timing+0x141/0x209
[ 1257.829021]  [<ffffffff8109569a>] ? trace_hardirqs_on+0xd/0xf
[ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
[ 1257.829021]  [<ffffffff810e3057>] time_hardirqs_on+0x2a/0x2f
[ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
[ 1257.829021]  [<ffffffff8109550c>] trace_hardirqs_on_caller+0x16/0x197
[ 1257.829021]  [<ffffffff8109569a>] trace_hardirqs_on+0xd/0xf
[ 1257.829021]  [<ffffffff811002b9>] user_enter+0xfd/0x107
[ 1257.829021]  [<ffffffff810029b4>] do_notify_resume+0x92/0x97
[ 1257.829021]  [<ffffffff8154bdca>] int_signal+0x12/0x17

What happened was entering into the user code, the interrupts were enabled
and a max interrupts off was recorded. The trace buffer was saved along with
various information about the task: comm, pid, uid, priority, etc.

The uid is recorded with task_uid(tsk). But this is a macro that uses rcu_read_lock()
to retrieve the data, and this happened to happen where RCU is blind (user_enter).

As only the preempt and irqs off tracers can have this happen, and they both
only have the tsk == current, if tsk == current, use current_uid() instead of
task_uid(), as current_uid() does not use RCU as only current can change its uid.

This fixes the RCU suspicious splat.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-06 12:35:30 -04:00
Linus Torvalds
7d80fea426 Merge branch 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:

 - Fix for yet another xattr bug which may lead to NULL deref.

 - A subtle bug in for_each_descendant_pre().  This bug requires quite
   specific conditions to trigger and isn't too likely to actually
   happen in the wild, but maybe that just makes it that much more
   nastier.

 - A warning message added for silly cgroup re-mount (not -o remount,
   but unmount followed by mount) behavior.

* 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: warn about mismatching options of a new mount of an existing hierarchy
  cgroup: fix a subtle bug in descendant pre-order walk
  cgroup: initialize xattr before calling d_instantiate()
2013-06-03 17:57:16 +09:00
Linus Torvalds
484b002e28 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:

 - Three EFI-related fixes

 - Two early memory initialization fixes

 - build fix for older binutils

 - fix for an eager FPU performance regression -- currently we don't
   allow the use of the FPU at interrupt time *at all* in eager mode,
   which is clearly wrong.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Allow FPU to be used at interrupt time even with eagerfpu
  x86, crc32-pclmul: Fix build with older binutils
  x86-64, init: Fix a possible wraparound bug in switchover in head_64.S
  x86, range: fix missing merge during add range
  x86, efi: initial the local variable of DataSize to zero
  efivar: fix oops in efivar_update_sysfs_entries() caused by memory reuse
  efivarfs: Never return ENOENT from firmware again
2013-05-31 09:44:10 +09:00
Steven Rostedt (Red Hat)
0184d50f9f tracing: Fix bad parameter passed in branch selftest
The branch selftest calls trace_test_buffer(), but with the new code
it expects the first parameter to be a pointer to a struct trace_buffer.
All self tests were changed but the branch selftest was missed.

This caused either a crash or failed test when the branch selftest was
enabled.

Link: http://lkml.kernel.org/r/20130529141333.GA24064@localhost

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-29 16:00:03 -04:00