linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-25 21:17:50 +07:00

Author	SHA1	Message	Date
Paul E. McKenney	f2dbe4a562	rcu: Localize rcu_state ->orphan_pend and ->orphan_done Given that the rcu_state structure's >orphan_pend and ->orphan_done fields are used only during migration of callbacks from the recently offlined CPU to a surviving CPU, if rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() are combined, these fields can become local variables in the combined function. This commit therefore combines rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() into a new rcu_segcblist_merge() function and removes the ->orphan_pend and ->orphan_done fields. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-07-25 13:04:49 -07:00
Paul E. McKenney	21cc248384	rcu: Advance callbacks after migration When migrating callbacks from a newly offlined CPU, we are already holding the root rcu_node structure's lock, so it costs almost nothing to advance and accelerate the newly migrated callbacks. This patch therefore makes this advancing and acceleration happen. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-07-25 13:04:48 -07:00
Paul E. McKenney	537b85c870	rcu: Eliminate rcu_state ->orphan_lock The ->orphan_lock is acquired and released only within the rcu_migrate_callbacks() function, which now acquires the root rcu_node structure's ->lock. This commit therefore eliminates the ->orphan_lock in favor of the root rcu_node structure's ->lock. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-07-25 13:04:48 -07:00
Paul E. McKenney	9fa46fb8c9	rcu: Advance outgoing CPU's callbacks before migrating them It is possible that the outgoing CPU is unaware of recent grace periods, and so it is also possible that some of its pending callbacks are actually ready to be invoked. The current callback-migration code would needlessly force these callbacks to pass through another grace period. This commit therefore invokes rcu_advance_cbs() on the outgoing CPU's callbacks in order to give them full credit for having passed through any recent grace periods. This also fixes an odd theoretical bug where there are no callbacks in the system except for those on the outgoing CPU, none of those callbacks have yet been associated with a grace-period number, there is never again another callback registered, and the surviving CPU never again takes a scheduling-clock interrupt, never goes idle, and never enters nohz_full userspace execution. Yes, this is (just barely) possible. It requires that the surviving CPU be a nohz_full CPU, that its scheduler-clock interrupt be shut off, and that it loop forever in the kernel. You get bonus points if you can make this one happen! ;-) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-07-25 13:04:47 -07:00
Paul E. McKenney	b1a2d79fe7	rcu: Make NOCB CPUs migrate CBs directly from outgoing CPU RCU's CPU-hotplug callback-migration code first moves the outgoing CPU's callbacks to ->orphan_done and ->orphan_pend, and only then moves them to the NOCB callback list. This commit avoids the extra step (and simplifies the code) by moving the callbacks directly from the outgoing CPU's callback list to the NOCB callback list. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-07-25 13:04:47 -07:00
Paul E. McKenney	95335c0355	rcu: Check for NOCB CPUs and empty lists earlier in CB migration The current CPU-hotplug RCU-callback-migration code checks for the source (newly offlined) CPU being a NOCBs CPU down in rcu_send_cbs_to_orphanage(). This commit simplifies callback migration a bit by moving this check up to rcu_migrate_callbacks(). This commit also adds a check for the source CPU having no callbacks, which eases analysis of the rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() functions. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-07-25 13:04:46 -07:00
Paul E. McKenney	c47e067a3c	rcu: Remove orphan/adopt event-tracing fields The rcu_node structure's ->n_cbs_orphaned and ->n_cbs_adopted fields are updated, but never read. This commit therefore removes them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-07-25 13:04:46 -07:00
Paul E. McKenney	313517fc44	rcu: Make expedited GPs correctly handle hardware CPU insertion The update of the ->expmaskinitnext and of ->ncpus are unsynchronized, with the value of ->ncpus being incremented long before the corresponding ->expmaskinitnext mask is updated. If an RCU expedited grace period sees ->ncpus change, it will update the ->expmaskinit masks from the new ->expmaskinitnext masks. But it is possible that ->ncpus has already been updated, but the ->expmaskinitnext masks still have their old values. For the current expedited grace period, no harm done. The CPU could not have been online before the grace period started, so there is no need to wait for its non-existent pre-existing readers. But the next RCU expedited grace period is in a world of hurt. The value of ->ncpus has already been updated, so this grace period will assume that the ->expmaskinitnext masks have not changed. But they have, and they won't be taken into account until the next never-been-online CPU comes online. This means that RCU will be ignoring some CPUs that it should be paying attention to. The solution is to update ->ncpus and ->expmaskinitnext while holding the ->lock for the rcu_node structure containing the ->expmaskinitnext mask. Because smp_store_release() is now used to update ->ncpus and smp_load_acquire() is now used to locklessly read it, if the expedited grace period sees ->ncpus change, then the updating CPU has to already be holding the corresponding ->lock. Therefore, when the expedited grace period later acquires that ->lock, it is guaranteed to see the new value of ->expmaskinitnext. On the other hand, if the expedited grace period loads ->ncpus just before an update, earlier full memory barriers guarantee that the incoming CPU isn't far enough along to be running any RCU readers. This commit therefore makes the required change. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-07-25 13:04:45 -07:00
Paul E. McKenney	a58163d8ca	rcu: Migrate callbacks earlier in the CPU-offline timeline RCU callbacks must be migrated away from an outgoing CPU, and this is done near the end of the CPU-hotplug operation, after the outgoing CPU is long gone. Unfortunately, this means that other CPU-hotplug callbacks can execute while the outgoing CPU's callbacks are still immobilized on the long-gone CPU's callback lists. If any of these CPU-hotplug callbacks must wait, either directly or indirectly, for the invocation of any of the immobilized RCU callbacks, the system will hang. This commit avoids such hangs by migrating the callbacks away from the outgoing CPU immediately upon its departure, shortly after the return from __cpu_die() in takedown_cpu(). Thus, RCU is able to advance these callbacks and invoke them, which allows all the after-the-fact CPU-hotplug callbacks to wait on these RCU callbacks without risk of a hang. While in the neighborhood, this commit also moves rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() under a pre-existing #ifdef to avoid including dead code on the one hand and to avoid define-without-use warnings on the other hand. Reported-by: Jeffrey Hugo <jhugo@codeaurora.org> Link: http://lkml.kernel.org/r/db9c91f6-1b17-6136-84f0-03c3c2581ab4@codeaurora.org Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Richard Weinberger <richard@nod.at>	2017-07-25 13:03:43 -07:00
Paul E. McKenney	96036c4306	rcu: Add last-CPU to GP-kthread starvation messages This commit augments the grace-period-kthread starvation debugging messages by adding the last CPU that ran the kthread. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-07-24 16:04:18 -07:00
Paul E. McKenney	fe5ac724d8	rcu: Remove nohz_full full-system-idle state machine The NO_HZ_FULL_SYSIDLE full-system-idle capability was added in 2013 by commit `0edd1b1784` ("nohz_full: Add full-system-idle state machine"), but has not been used. This commit therefore removes it. If it turns out to be needed later, this commit can always be reverted. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-06-08 18:52:39 -07:00
Paul E. McKenney	f7a10a9750	rcu: Remove the RCU_KTHREAD_PRIO Kconfig option Anything that can be done with the RCU_KTHREAD_PRIO Kconfig option can also be done with the rcutree.kthread_prio kernel boot parameter. This commit therefore removes this Kconfig option. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rik van Riel <riel@redhat.com>	2017-06-08 18:52:39 -07:00
Paul E. McKenney	90040c9e30	rcu: Remove _SLOW_ Kconfig options The RCU_TORTURE_TEST_SLOW_PREINIT, RCU_TORTURE_TEST_SLOW_PREINIT_DELAY, RCU_TORTURE_TEST_SLOW_PREINIT_DELAY, RCU_TORTURE_TEST_SLOW_INIT, RCU_TORTURE_TEST_SLOW_INIT_DELAY, RCU_TORTURE_TEST_SLOW_CLEANUP, and RCU_TORTURE_TEST_SLOW_CLEANUP_DELAY Kconfig options are only useful for torture testing, and there are the rcutree.gp_cleanup_delay, rcutree.gp_init_delay, and rcutree.gp_preinit_delay kernel boot parameters that rcutorture can use instead. The effect of these parameters is to artificially slow down grace period initialization and cleanup in order to make some types of race conditions happen more often. This commit therefore simplifies Tree RCU a bit by removing the Kconfig options and adding the corresponding kernel parameters to rcutorture's .boot files instead. However, this commit also leaves out the kernel parameters for TREE02, TREE04, and TREE07 in order to have about the same number of tests slowed as not slowed. TREE01, TREE03, TREE05, and TREE06 are slowed, and the rest are not slowed. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-06-08 18:52:38 -07:00
Paul E. McKenney	fa3c664769	rcu: Improve __call_rcu() debug-objects error message The "__call_rcu(): Leaked duplicate callback" error message from __call_rcu() has proven to be unhelpful. This commit therefore changes it to "__call_rcu(): Double-freed CB" and adds the value of the pointer passed in. The value of the pointer improves debuggability by allowing correlation with tracing output, for example, the rcu:rcu_callback trace event. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-06-08 18:52:31 -07:00
Paul E. McKenney	791875d16e	rcu: Eliminate the unused __rcu_is_watching() function The __rcu_is_watching() function is currently not used, aside from to implement the rcu_is_watching() function. This commit therefore eliminates __rcu_is_watching(), which has the beneficial side-effect of shrinking include/linux/rcupdate.h a bit. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-06-08 18:52:30 -07:00
Paul E. McKenney	a68a2bb28b	rcu: Move docbook comments out of rcupdate.h The include/linux/rcupdate.h file is included by more than 200 files, so shrinking it should provide some build-time benefits. This commit therefore moves several docbook comments from rcupdate.h to kernel/rcu/update.c, kernel/rcu/tree.c, and kernel/rcu/tree_plugin.h, thus reducing the number of times that the compiler has to scan these comments. This likely provides only a small benefit, but every little bit helps. This commit also fixes a malformed bulleted list noted by the 0day Test Robot. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-06-08 18:52:27 -07:00
Paul E. McKenney	c0b334c5bf	rcu: Add lockdep_assert_held() teeth to tree.c Comments can be helpful, but assertions carry more force. This commit therefore adds lockdep_assert_held() and RCU_LOCKDEP_WARN() calls to enforce lock-held and interrupt-disabled preconditions. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-06-08 08:25:37 -07:00
Paul E. McKenney	17c7798bea	rcu: Update rcu_bootup_announce_oddness() This commit updates rcu_bootup_announce_oddness() to check additional Kconfig options and module/boot parameters. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-06-08 08:25:35 -07:00
Paul E. McKenney	f4687d2637	rcu: Add preemptibility checks in rcu_sched_qs() and rcu_bh_qs() This commit adds WARN_ON_ONCE() calls that trigger if either rcu_sched_qs() or rcu_bh_qs() are invoked with preemption enabled. In the immortal words of Peter Zijlstra: "these are much harder to ignore than comments". Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-06-08 08:25:34 -07:00
Paul E. McKenney	e28371c891	rcu: Remove obsolete reference to synchronize_kernel() The synchronize_kernel() primitive was removed in favor of synchronize_sched() more than a decade ago, and it seems likely that rather few kernel hackers are familiar with it. Its continued presence is therefore providing more confusion than enlightenment. This commit therefore removes the reference from the synchronize_sched() header comment, and adds the corresponding information to the synchronize_rcu(0 header comment. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-06-08 08:25:25 -07:00
Paul E. McKenney	5b72f9643b	rcu: Complain if blocking in preemptible RCU read-side critical section Although preemptible RCU allows its read-side critical sections to be preempted, general blocking is forbidden. The reason for this is that excessive preemption times can be handled by CONFIG_RCU_BOOST=y, but a voluntarily blocked task doesn't care how high you boost its priority. Because preemptible RCU is a global mechanism, one ill-behaved reader hurts everyone. Hence the prohibition against general blocking in RCU-preempt read-side critical sections. Preemption yes, blocking no. This commit enforces this prohibition. There is a special exception for the -rt patchset (which they kindly volunteered to implement): It is OK to block (as opposed to merely being preempted) within an RCU-preempt read-side critical section, but only if the blocking is subject to priority inheritance. This exception permits CONFIG_RCU_BOOST=y to get -rt RCU readers out of trouble. Why doesn't this exception also apply to mainline's rt_mutex? Because of the possibility that someone does general blocking while holding an rt_mutex. Yes, the priority boosting will affect the rt_mutex, but it won't help with the task doing general blocking while holding that rt_mutex. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-06-08 08:25:24 -07:00
Paul E. McKenney	f92c734f02	rcu: Prevent rcu_barrier() from starting needless grace periods Currently rcu_barrier() uses call_rcu() to enqueue new callbacks on each CPU with a non-empty callback list. This works, but means that rcu_barrier() forces grace periods that are not otherwise needed. The key point is that rcu_barrier() never needs to wait for a grace period, but instead only for all pre-existing callbacks to be invoked. This means that rcu_barrier()'s new callbacks should be placed in the callback-list segment containing the last pre-existing callback. This commit makes this change using the new rcu_segcblist_entrain() function. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-06-08 08:25:22 -07:00
Linus Torvalds	de4d195308	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes are: - Debloat RCU headers - Parallelize SRCU callback handling (plus overlapping patches) - Improve the performance of Tree SRCU on a CPU-hotplug stress test - Documentation updates - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits) rcu: Open-code the rcu_cblist_n_lazy_cbs() function rcu: Open-code the rcu_cblist_n_cbs() function rcu: Open-code the rcu_cblist_empty() function rcu: Separately compile large rcu_segcblist functions srcu: Debloat the <linux/rcu_segcblist.h> header srcu: Adjust default auto-expediting holdoff srcu: Specify auto-expedite holdoff time srcu: Expedite first synchronize_srcu() when idle srcu: Expedited grace periods with reduced memory contention srcu: Make rcutorture writer stalls print SRCU GP state srcu: Exact tracking of srcu_data structures containing callbacks srcu: Make SRCU be built by default srcu: Fix Kconfig botch when SRCU not selected rcu: Make non-preemptive schedule be Tasks RCU quiescent state srcu: Expedite srcu_schedule_cbs_snp() callback invocation srcu: Parallelize callback handling kvm: Move srcu_struct fields to end of struct kvm rcu: Fix typo in PER_RCU_NODE_PERIOD header comment rcu: Use true/false in assignment to bool rcu: Use bool value directly ...	2017-05-10 10:30:46 -07:00
Paul E. McKenney	933dfbd7c4	rcu: Open-code the rcu_cblist_n_lazy_cbs() function Because the rcu_cblist_n_lazy_cbs() just samples the ->len_lazy counter, and because the rcu_cblist structure is quite straightforward, it makes sense to open-code rcu_cblist_n_lazy_cbs(p) as p->len_lazy, cutting out a level of indirection. This commit makes this change. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2017-05-02 09:22:48 -07:00
Paul E. McKenney	4b27f20b40	rcu: Open-code the rcu_cblist_n_cbs() function Because the rcu_cblist_n_cbs() just samples the ->len counter, and because the rcu_cblist structure is quite straightforward, it makes sense to open-code rcu_cblist_n_cbs(p) as p->len, cutting out a level of indirection. This commit makes this change. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2017-05-02 09:21:59 -07:00
Paul E. McKenney	8ef0f37efb	rcu: Open-code the rcu_cblist_empty() function Because the rcu_cblist_empty() just samples the ->head pointer, and because the rcu_cblist structure is quite straightforward, it makes sense to open-code rcu_cblist_empty(p) as !p->head, cutting out a level of indirection. This commit makes this change. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2017-05-02 08:18:40 -07:00
Paul E. McKenney	7f6733c3c6	srcu: Make rcutorture writer stalls print SRCU GP state In the past, SRCU was simple enough that there was little point in making the rcutorture writer stall messages print the SRCU grace-period number state. With the advent of Tree SRCU, this has changed. This commit therefore makes Classic, Tiny, and Tree SRCU report this state to rcutorture as needed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Mike Galbraith <efault@gmx.de>	2017-04-26 11:23:28 -07:00
Paul E. McKenney	f2094107ac	Merge branches 'doc.2017.04.12a', 'fixes.2017.04.19a' and 'srcu.2017.04.21a' into HEAD doc.2017.04.12a: Documentation updates fixes.2017.04.19a: Miscellaneous fixes srcu.2017.04.21a: Parallelize SRCU callback handling	2017-04-21 06:00:13 -07:00
Paul E. McKenney	bcbfdd01dc	rcu: Make non-preemptive schedule be Tasks RCU quiescent state Currently, a call to schedule() acts as a Tasks RCU quiescent state only if a context switch actually takes place. However, just the call to schedule() guarantees that the calling task has moved off of whatever tracing trampoline that it might have been one previously. This commit therefore plumbs schedule()'s "preempt" parameter into rcu_note_context_switch(), which then records the Tasks RCU quiescent state, but only if this call to schedule() was -not- due to a preemption. To avoid adding overhead to the common-case context-switch path, this commit hides the rcu_note_context_switch() check under an existing non-common-case check. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-04-21 05:59:27 -07:00
Paul E. McKenney	da915ad5cf	srcu: Parallelize callback handling Peter Zijlstra proposed using SRCU to reduce mmap_sem contention [1,2], however, there are workloads that could result in a high volume of concurrent invocations of call_srcu(), which with current SRCU would result in excessive lock contention on the srcu_struct structure's ->queue_lock, which protects SRCU's callback lists. This commit therefore moves SRCU to per-CPU callback lists, thus greatly reducing contention. Because a given SRCU instance no longer has a single centralized callback list, starting grace periods and invoking callbacks are both more complex than in the single-list Classic SRCU implementation. Starting grace periods and handling callbacks are now handled using an srcu_node tree that is in some ways similar to the rcu_node trees used by RCU-bh, RCU-preempt, and RCU-sched (for example, the srcu_node tree shape is controlled by exactly the same Kconfig options and boot parameters that control the shape of the rcu_node tree). In addition, the old per-CPU srcu_array structure is now named srcu_data and contains an rcu_segcblist structure named ->srcu_cblist for its callbacks (and a spinlock to protect this). The srcu_struct gets an srcu_gp_seq that is used to associate callback segments with the corresponding completion-time grace-period number. These completion-time grace-period numbers are propagated up the srcu_node tree so that the grace-period workqueue handler can determine whether additional grace periods are needed on the one hand and where to look for callbacks that are ready to be invoked. The srcu_barrier() function must now wait on all instances of the per-CPU ->srcu_cblist. Because each ->srcu_cblist is protected by ->lock, srcu_barrier() can remotely add the needed callbacks. In theory, it could also remotely start grace periods, but in practice doing so is complex and racy. And interestingly enough, it is never necessary for srcu_barrier() to start a grace period because srcu_barrier() only enqueues a callback when a callback is already present--and it turns out that a grace period has to have already been started for this pre-existing callback. Furthermore, it is only the callback that srcu_barrier() needs to wait on, not any particular grace period. Therefore, a new rcu_segcblist_entrain() function enqueues the srcu_barrier() function's callback into the same segment occupied by the last pre-existing callback in the list. The special case where all the pre-existing callbacks are on a different list (because they are in the process of being invoked) is handled by enqueuing srcu_barrier()'s callback into the RCU_DONE_TAIL segment, relying on the done-callbacks check that takes place after all callbacks are inovked. Note that the readers use the same algorithm as before. Note that there is a separate srcu_idx that tells the readers what counter to increment. This unfortunately cannot be combined with srcu_gp_seq because they need to be incremented at different times. This commit introduces some ugly #ifdefs in rcutorture. These will go away when I feel good enough about Tree SRCU to ditch Classic SRCU. Some crude performance comparisons, courtesy of a quickly hacked rcuperf asynchronous-grace-period capability: Callback Queuing Overhead ------------------------- # CPUS Classic SRCU Tree SRCU ------ ------------ --------- 2 0.349 us 0.342 us 16 31.66 us 0.4 us 41 --------- 0.417 us The times are the 90th percentiles, a statistic that was chosen to reject the overheads of the occasional srcu_barrier() call needed to avoid OOMing the test machine. The rcuperf test hangs when running Classic SRCU at 41 CPUs, hence the line of dashes. Despite the hacks to both the rcuperf code and that statistics, this is a convincing demonstration of Tree SRCU's performance and scalability advantages. [1] https://lwn.net/Articles/309030/ [2] https://patchwork.kernel.org/patch/5108281/ Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Fix initialization if synchronize_srcu_expedited() called first. ]	2017-04-21 05:59:26 -07:00
Paul E. McKenney	bfd090be14	rcu: Fix typo in PER_RCU_NODE_PERIOD header comment This commit just changes a "the the" to "the" to reduce repetition. Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-04-19 09:29:20 -07:00
Nicholas Mc Guire	50dc7def4a	rcu: Use bool value directly The beenonline variable is declared bool so there is no need for an explicit comparison, especially not against the constant zero. Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-04-19 09:29:19 -07:00
Paul E. McKenney	deb34f3643	rcu: Improve comments for hotplug/suspend/hibernate functions Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-04-19 09:29:18 -07:00
Paul E. McKenney	d1e4f01d09	rcu: Remove obsolete comment from rcu_future_gp_cleanup() header The rcu_nocb_gp_cleanup() function is now invoked elsewhere, so this commit drags this comment into the year 2017. Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-04-19 09:29:17 -07:00
Paul E. McKenney	e95d68d212	srcu: Make num_rcu_lvl[] array be external This commit makes the num_rcu_lvl[] array external so that SRCU can make use of it for initializing its upcoming srcu_node tree. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-04-18 11:38:21 -07:00
Paul E. McKenney	41f5c63178	rcu: Remove redundant levelcnt[] array from rcu_init_one() The levelcnt[] array is identical to num_rcu_lvl[], so this commit removes levelcnt[]. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-04-18 11:38:21 -07:00
Paul E. McKenney	2b34c43cc1	srcu: Move rcu_init_levelspread() to rcu_tree_node.h This commit moves the rcu_init_levelspread() function from kernel/rcu/tree.c to kernel/rcu/rcu.h so that SRCU can access it. This is another step towards enabling SRCU to create its own combining tree. This commit is code-movement only, give or take knock-on adjustments. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-04-18 11:38:20 -07:00
Paul E. McKenney	2e8c28c2dd	srcu: Move rcu_seq_start() and friends to rcu.h This commit moves rcu_seq_start(), rcu_seq_end(), rcu_seq_snap(), and rcu_seq_done() from kernel/rcu/tree.c to kernel/rcu/rcu.h. This will allow SRCU to use these functions, which in turn will allow SRCU to move from a single global callback queue to a per-CPU callback queue. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-04-18 11:38:19 -07:00
Paul E. McKenney	900b1028ec	srcu: Allow SRCU to access rcu_scheduler_active This is primarily a code-movement commit in preparation for allowing SRCU to handle early-boot SRCU grace periods. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-04-18 11:38:18 -07:00
Paul E. McKenney	15fecf89e4	srcu: Abstract multi-tail callback list handling RCU has only one multi-tail callback list, which is implemented via the nxtlist, nxttail, nxtcompleted, qlen_lazy, and qlen fields in the rcu_data structure, and whose operations are open-code throughout the Tree RCU implementation. This has been more or less OK in the past, but upcoming callback-list optimizations in SRCU could really use a multi-tail callback list there as well. This commit therefore abstracts the multi-tail callback list handling into a new kernel/rcu/rcu_segcblist.h file, and uses this new API. The simple head-and-tail pointer callback list is also abstracted and applied everywhere except for the NOCB callback-offload lists. (Yes, the plan is to apply them there as well, but this commit is already bigger than would be good.) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-04-18 11:38:18 -07:00
Paul E. McKenney	9226b10d78	rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actions The rcu_all_qs() and rcu_note_context_switch() do a series of checks, taking various actions to supply RCU with quiescent states, depending on the outcomes of the various checks. This is a bit much for scheduling fastpaths, so this commit creates a separate ->rcu_urgent_qs field in the rcu_dynticks structure that acts as a global guard for these checks. Thus, in the common case, rcu_all_qs() and rcu_note_context_switch() check the ->rcu_urgent_qs field, find it false, and simply return. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org>	2017-04-18 11:38:18 -07:00
Paul E. McKenney	0f9be8cabb	rcu: Eliminate flavor scan in rcu_momentary_dyntick_idle() The rcu_momentary_dyntick_idle() function scans the RCU flavors, checking that one of them still needs a quiescent state before doing an expensive atomic operation on the ->dynticks counter. However, this check reduces overhead only after a rare race condition, and increases complexity. This commit therefore removes the scan and the mechanism enabling the scan. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-04-18 11:38:18 -07:00
Paul E. McKenney	9577df9a31	rcu: Pull rcu_qs_ctr into rcu_dynticks structure The rcu_qs_ctr variable is yet another isolated per-CPU variable, so this commit pulls it into the pre-existing rcu_dynticks per-CPU structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-04-18 11:38:17 -07:00
Paul E. McKenney	abb06b9948	rcu: Pull rcu_sched_qs_mask into rcu_dynticks structure The rcu_sched_qs_mask variable is yet another isolated per-CPU variable, so this commit pulls it into the pre-existing rcu_dynticks per-CPU structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-04-18 11:38:17 -07:00
Paul E. McKenney	88a4976d0e	rcu: Semicolon inside RCU_TRACE() for tree.c The current use of "RCU_TRACE(statement);" can cause odd bugs, especially where "statement" is a local-variable declaration, as it can leave a misplaced ";" in the source code. This commit therefore converts these to "RCU_TRACE(statement;)", which avoids the misplaced ";". Reported-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-04-18 11:38:17 -07:00
Paul E. McKenney	b8c17e6664	rcu: Maintain special bits at bottom of ->dynticks counter Currently, IPIs are used to force other CPUs to invalidate their TLBs in response to a kernel virtual-memory mapping change. This works, but degrades both battery lifetime (for idle CPUs) and real-time response (for nohz_full CPUs), and in addition results in unnecessary IPIs due to the fact that CPUs executing in usermode are unaffected by stale kernel mappings. It would be better to cause a CPU executing in usermode to wait until it is entering kernel mode to do the flush, first to avoid interrupting usemode tasks and second to handle multiple flush requests with a single flush in the case of a long-running user task. This commit therefore reserves a bit at the bottom of the ->dynticks counter, which is checked upon exit from extended quiescent states. If it is set, it is cleared and then a new rcu_eqs_special_exit() macro is invoked, which, if not supplied, is an empty single-pass do-while loop. If this bottom bit is set on -entry- to an extended quiescent state, then a WARN_ON_ONCE() triggers. This bottom bit may be set using a new rcu_eqs_special_set() function, which returns true if the bit was set, or false if the CPU turned out to not be in an extended quiescent state. Please note that this function refuses to set the bit for a non-nohz_full CPU when that CPU is executing in usermode because usermode execution is tracked by RCU as a dyntick-idle extended quiescent state only for nohz_full CPUs. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2017-04-18 11:19:22 -07:00
Steven Rostedt (VMware)	03ecd3f48e	rcu/tracing: Add rcu_disabled to denote when rcu_irq_enter() will not work Tracing uses rcu_irq_enter() as a way to make sure that RCU is watching when it needs to use rcu_read_lock() and friends. This is because tracing can happen as RCU is about to enter user space, or about to go idle, and RCU does not watch for RCU read side critical sections as it makes the transition. There is a small location within the RCU infrastructure that rcu_irq_enter() itself will not work. If tracing were to occur in that section it will break if it tries to use rcu_irq_enter(). Originally, this happens with the stack_tracer, because it will call save_stack_trace when it encounters stack usage that is greater than any stack usage it had encountered previously. There was a case where that happened in the RCU section where rcu_irq_enter() did not work, and lockdep complained loudly about it. To fix it, stack tracing added a call to be disabled and RCU would disable stack tracing during the critical section that rcu_irq_enter() was inoperable. This solution worked, but there are other cases that use rcu_irq_enter() and it would be a good idea to let RCU give a way to let others know that rcu_irq_enter() will not work. For example, in trace events. Another helpful aspect of this change is that it also moves the per cpu variable called in the RCU critical section into a cache locale along with other RCU per cpu variables used in that same location. I'm keeping the stack_trace_disable() code, as that still could be used in the future by places that really need to disable it. And since it's only a static inline, it wont take up any kernel text if it is not used. Link: http://lkml.kernel.org/r/20170405093207.404f8deb@gandalf.local.home Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-04-10 15:22:03 -04:00
Paul E. McKenney	a278d47189	rcu: Fix dyntick-idle tracing The tracing subsystem started using rcu_irq_entry() and rcu_irq_exit() (with my blessing) to allow the current _rcuidle alternative tracepoint name to be dispensed with while still maintaining good performance. Unfortunately, this causes RCU's dyntick-idle entry code's tracing to appear to RCU like an interrupt that occurs where RCU is not designed to handle interrupts. This commit fixes this problem by moving the zeroing of ->dynticks_nesting after the offending trace_rcu_dyntick() statement, which narrows the window of vulnerability to a pair of adjacent statements that are now marked with comments to that effect. Link: http://lkml.kernel.org/r/20170405093207.404f8deb@gandalf.local.home Link: http://lkml.kernel.org/r/20170405193928.GM1600@linux.vnet.ibm.com Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-04-10 15:21:57 -04:00
Ingo Molnar	b17b01533b	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h> We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/debug.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:34 +01:00
Ingo Molnar	ae7e81c077	sched/headers: Prepare for new header dependencies before moving code to <uapi/linux/sched/types.h> We are going to move scheduler ABI details to <uapi/linux/sched/types.h>, which will be used from a number of .c files. Create empty placeholder header that maps to <linux/types.h>. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:27 +01:00
Ingo Molnar	f9411ebe3d	rcu: Separate the RCU synchronization types and APIs into <linux/rcupdate_wait.h> So rcupdate.h is a pretty complex header, in particular it includes <linux/completion.h> which includes <linux/wait.h> - creating a dependency that includes <linux/wait.h> in <linux/sched.h>, which prevents the isolation of <linux/sched.h> from the derived <linux/wait.h> header. Solve part of the problem by decoupling rcupdate.h from completions: this can be done by separating out the rcu_synchronize types and APIs, and updating their usage sites. Since this is a mostly RCU-internal types this will not just simplify <linux/sched.h>'s dependencies, but will make all the hundreds of .c files that include rcupdate.h but not completions or wait.h build faster. ( For rcutiny this means that two dependent APIs have to be uninlined, but that shouldn't be much of a problem as they are rare variants. ) Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:24 +01:00
Paul E. McKenney	31945aa9f1	Merge branches 'doc.2017.01.15b', 'dyntick.2017.01.23a', 'fixes.2017.01.23a', 'srcu.2017.01.25a' and 'torture.2017.01.15b' into HEAD doc.2017.01.15b: Documentation updates dyntick.2017.01.23a: Dyntick tracking consolidation fixes.2017.01.23a: Miscellaneous fixes srcu.2017.01.25a: SRCU rewrite, fixes, and verification torture.2017.01.15b: Torture-test updates	2017-01-25 12:56:05 -08:00
Paul E. McKenney	38d30b336c	rcu: Adjust FQS offline checks for exact online-CPU detection Commit `7ec99de36f` ("rcu: Provide exact CPU-online tracking for RCU"), as its title suggests, got rid of RCU's remaining CPU-hotplug timing guesswork. This commit therefore removes the one-jiffy kludge that was used to paper over this guesswork. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2017-01-23 11:44:18 -08:00
Paul E. McKenney	3a19b46a5c	rcu: Check cond_resched_rcu_qs() state less often to reduce GP overhead Commit `4a81e8328d` ("rcu: Reduce overhead of cond_resched() checks for RCU") moved quiescent-state generation out of cond_resched() and commit `bde6c3aa99` ("rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops") introduced cond_resched_rcu_qs(), and commit `5cd37193ce` ("rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors") introduced the per-CPU rcu_qs_ctr variable, which is frequently polled by the RCU core state machine. This frequent polling can increase grace-period rate, which in turn increases grace-period overhead, which is visible in some benchmarks (for example, the "open1" benchmark in Anton Blanchard's "will it scale" suite). This commit therefore reduces the rate at which rcu_qs_ctr is polled by moving that polling into the force-quiescent-state (FQS) machinery, and by further polling it only after the grace period has been in effect for at least jiffies_till_sched_qs jiffies. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2017-01-23 11:44:18 -08:00
Paul E. McKenney	02a5c550b2	rcu: Abstract extended quiescent state determination This commit is the fourth step towards full abstraction of all accesses to the ->dynticks counter, implementing previously open-coded checks and comparisons in new rcu_dynticks_in_eqs() and rcu_dynticks_in_eqs_since() functions. This abstraction will ease changes to the ->dynticks counter operation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2017-01-23 11:44:18 -08:00
Paul E. McKenney	2625d469ba	rcu: Abstract dynticks extended quiescent state enter/exit operations This commit is the third step towards full abstraction of all accesses to the ->dynticks counter, implementing the previously open-coded atomic add of 1 and entry checks in a new rcu_dynticks_eqs_enter() function, and the same but with exit checks in a new rcu_dynticks_eqs_exit() function. This abstraction will ease changes to the ->dynticks counter operation. Note that this commit gets rid of the smp_mb__before_atomic() and the smp_mb__after_atomic() calls that were previously present. The reason that this is OK from a memory-ordering perspective is that the atomic operation is now atomic_add_return(), which, as a value-returning atomic, guarantees full ordering. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Fixed RCU_TRACE() statements added by this commit. ] Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2017-01-23 11:42:43 -08:00
Paul E. McKenney	fdbb9b315c	rcu: Make rcu_cpu_starting() use its "cpu" argument The rcu_cpu_starting() function uses this_cpu_ptr() to locate the incoming CPU's rcu_data structure. This works for the boot CPU and for all CPUs onlined after rcu_init() executes (during very early boot). Currently, this is the full set of CPUs, so all is well. But if anyone ever parallelizes boot before rcu_init() time, it will fail. This commit therefore substitutes the rcu_cpu_starting() function's this_cpu_pointer() for per_cpu_ptr(), future-proofing the code and (arguably) improving readability. This commit inadvertently fixes a latent bug: If there ever had been more than just the boot CPU online at rcu_init() time, the old code would not initialize the non-boot CPUs, but rather would repeatedly initialize the boot CPU. Reported-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2017-01-23 11:37:13 -08:00
Paul E. McKenney	630c7ed9ca	rcu: Don't wake rcuc/X kthreads on NOCB CPUs Chris Friesen notice that rcuc/X kthreads were consuming CPU even on NOCB CPUs. This makes no sense because the only purpose or these kthreads is to invoke normal (non-offloaded) callbacks, of which there will never be any on NOCB CPUs. This problem was due to a bug in cpu_has_callbacks_ready_to_invoke(), which should have been checking ->nxttail[RCU_NEXT_TAIL] for NULL, but which was instead (incorrectly) checking ->nxttail[RCU_DONE_TAIL]. Because ->nxttail[RCU_DONE_TAIL] is never NULL, the only effect is to cause the rcuc/X kthread to execute when it should not do so. This commit therefore checks ->nxttail[RCU_NEXT_TAIL], which is NULL for NOCB CPUs. Reported-by: Chris Friesen <chris.friesen@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2017-01-23 11:37:13 -08:00
Paul E. McKenney	7aa92230c9	rcu: Once again use NMI-based stack traces in stall warnings This commit is for all intents and purposes a revert of `bc1dce514e` ("rcu: Don't use NMIs to dump other CPUs' stacks"). The reason to suppose that this can now safely be reverted is the presence of `42a0bb3f71` ("printk/nmi: generic solution for safe printk in NMI"), which is said to have made NMI-based stack dumps safe. However, this reversion keeps one nice property of `bc1dce514e` ("rcu: Don't use NMIs to dump other CPUs' stacks"), namely that only those CPUs blocking the grace period are dumped. The new trigger_single_cpu_backtrace() is used to make this happen, as suggested by Josh Poimboeuf. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2017-01-23 11:37:12 -08:00
Paul E. McKenney	b201fa6737	rcu: Remove short-term CPU kicking Commit `4914950aaa` ("rcu: Stop treating in-kernel CPU-bound workloads as errors") added a (relatively) short-timeout call to resched_cpu(). This was inspired by as issue that was fixed by `b7e7ade34e` ("sched/core: Fix remote wakeups"). But given that this issue was fixed, it is time for the current commit to remove this call to resched_cpu(). Reported-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2017-01-23 11:37:12 -08:00
Paul E. McKenney	28053bc72c	rcu: Add long-term CPU kicking This commit prepares for the removal of short-term CPU kicking (in a subsequent commit). It does so by starting to invoke resched_cpu() for each holdout at each force-quiescent-state interval that is more than halfway through the stall-warning interval. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2017-01-23 11:33:02 -08:00
Tobias Klauser	94060d2235	rcu: Remove unused but set variable Since commit `7ec99de36f` ("rcu: Provide exact CPU-online tracking for RCU"), the variable mask in rcu_init_percpu_data is set but no longer used. Remove it to fix the following warning when building with 'W=1': kernel/rcu/tree.c: In function ‘rcu_init_percpu_data’: kernel/rcu/tree.c:3765:16: warning: variable ‘mask’ set but not used [-Wunused-but-set-variable] Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2017-01-23 11:32:35 -08:00
Byungchul Park	c4402b27f1	rcu: Only dump stalled-tasks stacks if there was a real stall The print_other_cpu_stall() function currently unconditionally invokes rcu_print_detail_task_stall(). This is OK because if there was a stall sufficient to cause print_other_cpu_stall() to be invoked, that stall is very likely to persist through the entire print_other_cpu_stall() execution. However, if the stall did not persist, the variable ndetected will be zero, and that variable is already tested in an "if" statement. Therefore, this commit moves the call to rcu_print_detail_task_stall() under that pre-existing "if" to improve readability, with a very rare reduction in overhead. Signed-off-by: Byungchul Park <byungchul.park@lge.com> [ paulmck: Reworked commit log. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2017-01-23 11:32:22 -08:00
Paul E. McKenney	8b2f63ab05	rcu: Abstract the dynticks snapshot operation This commit is the second step towards full abstraction of all accesses to the ->dynticks counter, implementing the previously open-coded atomic add of zero in a new rcu_dynticks_snap() function. This abstraction will ease changes o the ->dynticks counter operation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2017-01-16 15:47:53 -08:00
Paul E. McKenney	6563de9d6f	rcu: Abstract the dynticks momentary-idle operation This commit is the first step towards full abstraction of all accesses to the ->dynticks counter, implementing the previously open-coded atomic add of two in a new rcu_dynticks_momentary_idle() function. This abstraction will ease changes to the ->dynticks counter operation. Note that this commit gets rid of the smp_mb__before_atomic() and the smp_mb__after_atomic() calls that were previously present. The reason that this is OK from a memory-ordering perspective is that the atomic operation is now atomic_add_return(), which, as a value-returning atomic, guarantees full ordering. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2017-01-16 14:59:54 -08:00
Paul E. McKenney	52d7e48b86	rcu: Narrow early boot window of illegal synchronous grace periods The current preemptible RCU implementation goes through three phases during bootup. In the first phase, there is only one CPU that is running with preemption disabled, so that a no-op is a synchronous grace period. In the second mid-boot phase, the scheduler is running, but RCU has not yet gotten its kthreads spawned (and, for expedited grace periods, workqueues are not yet running. During this time, any attempt to do a synchronous grace period will hang the system (or complain bitterly, depending). In the third and final phase, RCU is fully operational and everything works normally. This has been OK for some time, but there has recently been some synchronous grace periods showing up during the second mid-boot phase. This code worked "by accident" for awhile, but started failing as soon as expedited RCU grace periods switched over to workqueues in commit `8b355e3bc1` ("rcu: Drive expedited grace periods from workqueue"). Note that the code was buggy even before this commit, as it was subject to failure on real-time systems that forced all expedited grace periods to run as normal grace periods (for example, using the rcu_normal ksysfs parameter). The callchain from the failure case is as follows: early_amd_iommu_init() \|-> acpi_put_table(ivrs_base); \|-> acpi_tb_put_table(table_desc); \|-> acpi_tb_invalidate_table(table_desc); \|-> acpi_tb_release_table(...) \|-> acpi_os_unmap_memory \|-> acpi_os_unmap_iomem \|-> acpi_os_map_cleanup \|-> synchronize_rcu_expedited The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y, which caused the code to try using workqueues before they were initialized, which did not go well. This commit therefore reworks RCU to permit synchronous grace periods to proceed during this mid-boot phase. This commit is therefore a fix to a regression introduced in v4.9, and is therefore being put forward post-merge-window in v4.10. This commit sets a flag from the existing rcu_scheduler_starting() function which causes all synchronous grace periods to take the expedited path. The expedited path now checks this flag, using the requesting task to drive the expedited grace period forward during the mid-boot phase. Finally, this flag is updated by a core_initcall() function named rcu_exp_runtime_mode(), which causes the runtime codepaths to be used. Note that this arrangement assumes that tasks are not sent POSIX signals (or anything similar) from the time that the first task is spawned through core_initcall() time. Fixes: `8b355e3bc1` ("rcu: Drive expedited grace periods from workqueue") Reported-by: "Zheng, Lv" <lv.zheng@intel.com> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Stan Kain <stan.kain@gmail.com> Tested-by: Ivan <waffolz@hotmail.com> Tested-by: Emanuel Castelo <emanuel.castelo@gmail.com> Tested-by: Bruno Pesavento <bpesavento@infinito.it> Tested-by: Borislav Petkov <bp@suse.de> Tested-by: Frederic Bezies <fredbezies@gmail.com> Cc: <stable@vger.kernel.org> # 4.9.0-	2017-01-14 21:23:48 -08:00
Paul E. McKenney	aa3e0bf1aa	rcu: Don't kick unless grace period or request The current code can result in spurious kicks when there are no grace periods in progress and no grace-period-related requests. This is sort of OK for a diagnostic aid, but the resulting ftrace-dump messages in dmesg are annoying. This commit therefore avoids spurious kicks in the common case. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2016-11-14 10:46:31 -08:00
Paul E. McKenney	1f21b50b77	rcu: Remove obsolete comment from __call_rcu() The __call_rcu() comment about opportunistically noting grace period beginnings and endings is obsolete. RCU still does such opportunistic noting, but in __call_rcu_core() rather than __call_rcu(), and there already is an appropriate comment in __call_rcu_core(). This commit therefore removes the obsolete comment. Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2016-11-14 10:46:19 -08:00
Paul E. McKenney	5403d367a7	rcu: Remove obsolete rcu_check_callbacks() header comment In the deep past, rcu_check_callbacks() was only invoked if rcu_pending() returned true. Which was fine, but these days rcu_check_callbacks() is invoked unconditionally. This commit therefore removes the obsolete sentence from the header comment. Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2016-11-14 10:46:14 -08:00
Paul E. McKenney	b8f2ed5384	rcu: Tighten up __call_rcu() rcu_head alignment check Commit `720abae3d6` ("rcu: force alignment on struct callback_head/rcu_head") forced the rcu_head (AKA callback_head) structure's alignment to pointer size, that is, to 4-byte boundaries on 32-bit systems and to 8-byte boundaries on 64-bit systems. This commit therefore checks for this same alignment in __call_rcu(), which used to contain a looser check for two-byte alignment. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2016-11-14 10:46:08 -08:00
Linus Torvalds	9ffc66941d	This adds a new gcc plugin named "latent_entropy". It is designed to extract as much possible uncertainty from a running system at boot time as possible, hoping to capitalize on any possible variation in CPU operation (due to runtime data differences, hardware differences, SMP ordering, thermal timing variation, cache behavior, etc). At the very least, this plugin is a much more comprehensive example for how to manipulate kernel code using the gcc plugin internals. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Kees Cook <kees@outflux.net> iQIcBAABCgAGBQJX/BAFAAoJEIly9N/cbcAmzW8QALFbCs7EFFkML+M/M/9d8zEk 1QbUs/z8covJTTT1PjSdw7JUrAMulI3S00owpcQVd/PcWjRPU80QwfsXBgIB0tvC Kub2qxn6Oaf+kTB646zwjFgjdCecw/USJP+90nfcu2+LCnE8ReclKd1aUee+Bnhm iDEUyH2ONIoWq6ta2Z9sA7+E4y2ZgOlmW0iga3Mnf+OcPtLE70fWPoe5E4g9DpYk B+kiPDrD9ql5zsHaEnKG1ldjiAZ1L6Grk8rGgLEXmbOWtTOFmnUhR+raK5NA/RCw MXNuyPay5aYPpqDHFm+OuaWQAiPWfPNWM3Ett4k0d9ZWLixTcD1z68AciExwk7aW SEA8b1Jwbg05ZNYM7NJB6t6suKC4dGPxWzKFOhmBicsh2Ni5f+Az0BQL6q8/V8/4 8UEqDLuFlPJBB50A3z5ngCVeYJKZe8Bg/Swb4zXl6mIzZ9darLzXDEV6ystfPXxJ e1AdBb41WC+O2SAI4l64yyeswkGo3Iw2oMbXG5jmFl6wY/xGp7dWxw7gfnhC6oOh afOT54p2OUDfSAbJaO0IHliWoIdmE5ZYdVYVU9Ek+uWyaIwcXhNmqRg+Uqmo32jf cP5J9x2kF3RdOcbSHXmFp++fU+wkhBtEcjkNpvkjpi4xyA47IWS7lrVBBebrCq9R pa/A7CNQwibIV6YD8+/p =1dUK -----END PGP SIGNATURE----- Merge tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull gcc plugins update from Kees Cook: "This adds a new gcc plugin named "latent_entropy". It is designed to extract as much possible uncertainty from a running system at boot time as possible, hoping to capitalize on any possible variation in CPU operation (due to runtime data differences, hardware differences, SMP ordering, thermal timing variation, cache behavior, etc). At the very least, this plugin is a much more comprehensive example for how to manipulate kernel code using the gcc plugin internals" * tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: latent_entropy: Mark functions with __latent_entropy gcc-plugins: Add latent_entropy plugin	2016-10-15 10:03:15 -07:00
Emese Revfy	0766f788eb	latent_entropy: Mark functions with __latent_entropy The __latent_entropy gcc attribute can be used only on functions and variables. If it is on a function then the plugin will instrument it for gathering control-flow entropy. If the attribute is on a variable then the plugin will initialize it with random contents. The variable must be an integer, an integer array type or a structure with integer fields. These specific functions have been selected because they are init functions (to help gather boot-time entropy), are called at unpredictable times, or they have variable loops, each of which provide some level of latent entropy. Signed-off-by: Emese Revfy <re.emese@gmail.com> [kees: expanded commit message] Signed-off-by: Kees Cook <keescook@chromium.org>	2016-10-10 14:51:45 -07:00
Paul E. McKenney	d74b62bc32	Merge branches 'doc.2016.08.22c', 'exp.2016.08.22c', 'fixes.2016.09.14a', 'hotplug.2016.08.22c' and 'torture.2016.08.22c' into HEAD doc.2016.08.22c: Documentation updates exp.2016.08.22c: Expedited grace-period updates fixes.2016.09.14a: Miscellaneous fixes hotplug.2016.08.22c: CPU-hotplug changes torture.2016.08.22c: Torture-test changes	2016-09-14 12:58:49 -07:00
Paul E. McKenney	7ec99de36f	rcu: Provide exact CPU-online tracking for RCU Up to now, RCU has assumed that the CPU-online process makes it from CPU_UP_PREPARE to set_cpu_online() within one jiffy. Given the recent rise of virtualized environments, this assumption is very clearly obsolete. Failing to meet this deadline can result in RCU paying attention to an incoming CPU for one jiffy, then ignoring it until the grace period following the one in which that CPU sets itself online. This situation might prove to be fatally disappointing to any RCU read-side critical sections that had the misfortune to execute during the time in which RCU was ignoring the slow-to-come-online CPU. This commit therefore updates RCU's internal CPU state-tracking information at notify_cpu_starting() time, thus providing RCU with an exact transition of the CPU's state from offline to online. Note that this means that incoming CPUs must not use RCU read-side critical section (other than those of SRCU) until notify_cpu_starting() time. Note also that the CPU_STARTING notifiers -are- allowed to use RCU read-side critical sections. (Of course, CPU-hotplug notifiers are rapidly becoming obsolete, so you need to act fast!) If a given architecture or CPU family needs to use RCU read-side critical sections earlier, the call to rcu_cpu_starting() from notify_cpu_starting() will need to be architecture-specific, with architectures that need early use being required to hand-place the call to rcu_cpu_starting() at some point preceding the call to notify_cpu_starting(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-08-22 09:36:57 -07:00
Paul E. McKenney	3563a438f1	rcu: Avoid redundant quiescent-state chasing Currently, __note_gp_changes() checks to see if the CPU has slept through multiple grace periods. If it has, it resynchronizes that CPU's view of the grace-period state, which includes whether or not the current grace period needs a quiescent state from this CPU. The fact of this need (or lack thereof) needs to be in two places, rdp->cpu_no_qs.b.norm and rdp->core_needs_qs. The former tells RCU's context-switch code to go get a quiescent state and the latter says that it needs to be reported. The current code unconditionally sets the former to true, but correctly sets the latter. This does not result in failures, but it does unnecessarily increase the amount of work done on average at context-switch time. This commit therefore correctly sets both fields. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-08-22 09:35:57 -07:00
Paul Gortmaker	e77b704125	rcu: Don't use modular infrastructure in non-modular code The Kconfig currently controlling compilation of tree.c is: init/Kconfig:config TREE_RCU init/Kconfig: bool ...and update.c and sync.c are "obj-y" meaning that none are ever built as a module by anyone. Since MODULE_ALIAS is a no-op for non-modular code, we can remove them from these files. We leave moduleparam.h behind since the files instantiate some boot time configuration parameters with module_param() still. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-08-22 09:35:27 -07:00
Jisheng Zhang	94d4477673	rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads Commit `abedf8e241` ("rcu: Use simple wait queues where possible in rcutree") converts Tree RCU's wait queues to simple wait queues, but it incorrectly reverts the commit `2aa792e6fa` ("rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads"). This can result in redundant self-wakeups. This commit therefore replaces the simple wait-queue wakeups with rcu_gp_kthread_wake(), thus avoiding the redundant wakeups. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-08-22 09:33:46 -07:00
Thomas Gleixner	4df8374254	rcu: Convert rcutree to hotplug state machine Straight forward conversion to the state machine. Though the question arises whether this needs really all these state transitions to work. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153337.982013161@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-07-15 10:41:44 +02:00
Mark Rutland	bc75e99983	rcu: Correctly handle sparse possible cpus In many cases in the RCU tree code, we iterate over the set of cpus for a leaf node described by rcu_node::grplo and rcu_node::grphi, checking per-cpu data for each cpu in this range. However, if the set of possible cpus is sparse, some cpus described in this range are not possible, and thus no per-cpu region will have been allocated (or initialised) for them by the generic percpu code. Erroneous accesses to a per-cpu area for these !possible cpus may fault or may hit other data depending on the addressed generated when the erroneous per cpu offset is applied. In practice, both cases have been observed on arm64 hardware (the former being silent, but detectable with additional patches). To avoid issues resulting from this, we must iterate over the set of possible cpus for a given leaf node. This patch add a new helper, for_each_leaf_node_possible_cpu, to enable this. As iteration is often intertwined with rcu_node local bitmask manipulation, a new leaf_node_cpu_bit helper is added to make this simpler and more consistent. The RCU tree code is made to use both of these where appropriate. Without this patch, running reboot at a shell can result in an oops like: [ 3369.075979] Unable to handle kernel paging request at virtual address ffffff8008b21b4c [ 3369.083881] pgd = ffffffc3ecdda000 [ 3369.087270] [ffffff8008b21b4c] pgd=00000083eca48003, pud=00000083eca48003, *pmd=0000000000000000 [ 3369.096222] Internal error: Oops: 96000007 [#1] PREEMPT SMP [ 3369.101781] Modules linked in: [ 3369.104825] CPU: 2 PID: 1817 Comm: NetworkManager Tainted: G W 4.6.0+ #3 [ 3369.121239] task: ffffffc0fa13e000 ti: ffffffc3eb940000 task.ti: ffffffc3eb940000 [ 3369.128708] PC is at sync_rcu_exp_select_cpus+0x188/0x510 [ 3369.134094] LR is at sync_rcu_exp_select_cpus+0x104/0x510 [ 3369.139479] pc : [<ffffff80081109a8>] lr : [<ffffff8008110924>] pstate: 200001c5 [ 3369.146860] sp : ffffffc3eb9435a0 [ 3369.150162] x29: ffffffc3eb9435a0 x28: ffffff8008be4f88 [ 3369.155465] x27: ffffff8008b66c80 x26: ffffffc3eceb2600 [ 3369.160767] x25: 0000000000000001 x24: ffffff8008be4f88 [ 3369.166070] x23: ffffff8008b51c3c x22: ffffff8008b66c80 [ 3369.171371] x21: 0000000000000001 x20: ffffff8008b21b40 [ 3369.176673] x19: ffffff8008b66c80 x18: 0000000000000000 [ 3369.181975] x17: 0000007fa951a010 x16: ffffff80086a30f0 [ 3369.187278] x15: 0000007fa9505590 x14: 0000000000000000 [ 3369.192580] x13: ffffff8008b51000 x12: ffffffc3eb940000 [ 3369.197882] x11: 0000000000000006 x10: ffffff8008b51b78 [ 3369.203184] x9 : 0000000000000001 x8 : ffffff8008be4000 [ 3369.208486] x7 : ffffff8008b21b40 x6 : 0000000000001003 [ 3369.213788] x5 : 0000000000000000 x4 : ffffff8008b27280 [ 3369.219090] x3 : ffffff8008b21b4c x2 : 0000000000000001 [ 3369.224406] x1 : 0000000000000001 x0 : 0000000000000140 ... [ 3369.972257] [<ffffff80081109a8>] sync_rcu_exp_select_cpus+0x188/0x510 [ 3369.978685] [<ffffff80081128b4>] synchronize_rcu_expedited+0x64/0xa8 [ 3369.985026] [<ffffff80086b987c>] synchronize_net+0x24/0x30 [ 3369.990499] [<ffffff80086ddb54>] dev_deactivate_many+0x28c/0x298 [ 3369.996493] [<ffffff80086b6bb8>] __dev_close_many+0x60/0xd0 [ 3370.002052] [<ffffff80086b6d48>] __dev_close+0x28/0x40 [ 3370.007178] [<ffffff80086bf62c>] __dev_change_flags+0x8c/0x158 [ 3370.012999] [<ffffff80086bf718>] dev_change_flags+0x20/0x60 [ 3370.018558] [<ffffff80086cf7f0>] do_setlink+0x288/0x918 [ 3370.023771] [<ffffff80086d0798>] rtnl_newlink+0x398/0x6a8 [ 3370.029158] [<ffffff80086cee84>] rtnetlink_rcv_msg+0xe4/0x220 [ 3370.034891] [<ffffff80086e274c>] netlink_rcv_skb+0xc4/0xf8 [ 3370.040364] [<ffffff80086ced8c>] rtnetlink_rcv+0x2c/0x40 [ 3370.045663] [<ffffff80086e1fe8>] netlink_unicast+0x160/0x238 [ 3370.051309] [<ffffff80086e24b8>] netlink_sendmsg+0x2f0/0x358 [ 3370.056956] [<ffffff80086a0070>] sock_sendmsg+0x18/0x30 [ 3370.062168] [<ffffff80086a21cc>] ___sys_sendmsg+0x26c/0x280 [ 3370.067728] [<ffffff80086a30ac>] __sys_sendmsg+0x44/0x88 [ 3370.073027] [<ffffff80086a3100>] SyS_sendmsg+0x10/0x20 [ 3370.078153] [<ffffff8008085e70>] el0_svc_naked+0x24/0x28 Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Dennis Chen <dennis.chen@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-06-15 16:00:05 -07:00
Daniel Bristot de Oliveira	088e9d253d	rcu: sysctl: Panic on RCU Stall It is not always easy to determine the cause of an RCU stall just by analysing the RCU stall messages, mainly when the problem is caused by the indirect starvation of rcu threads. For example, when preempt_rcu is not awakened due to the starvation of a timer softirq. We have been hard coding panic() in the RCU stall functions for some time while testing the kernel-rt. But this is not possible in some scenarios, like when supporting customers. This patch implements the sysctl kernel.panic_on_rcu_stall. If set to 1, the system will panic() when an RCU stall takes place, enabling the capture of a vmcore. The vmcore provides a way to analyze all kernel/tasks states, helping out to point to the culprit and the solution for the stall. The kernel.panic_on_rcu_stall sysctl is disabled by default. Changes from v1: - Fixed a typo in the git log - The if(sysctl_panic_on_rcu_stall) panic() is in a static function - Fixed the CONFIG_TINY_RCU compilation issue - The var sysctl_panic_on_rcu_stall is now __read_mostly Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Arnaldo Carvalho de Melo <acme@kernel.org> Tested-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-06-15 16:00:05 -07:00
Paul E. McKenney	3549c2bc2c	rcu: Move expedited code from tree.c to tree_exp.h People have been having some difficulty finding their way around the RCU code. This commit therefore pulls some of the expedited grace-period code from tree.c to a new tree_exp.h file. This commit is strictly code movement, with the exception of a forward declaration that was added for the sync_sched_exp_online_cleanup() function. A subsequent commit will move the remaining expedited grace-period code from tree_plugin.h to tree_exp.h. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-06-14 16:01:41 -07:00
Peter Zijlstra	d3acab65f2	rcu: Remove some superfluous lines I think you'll find this condition is superfluous, as the whole function is under #ifdef of that same. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-06-14 16:01:41 -07:00
Paul E. McKenney	590d1757b9	rcu: Fix outdated hotplug-exclusion comment in rcu_gp_init() In the past, RCU grace-period initialization excluded CPU-hotplug operations, but this is no longer the case. This commit therefore removed an outdated comment in rcu_gp_init() claiming that these are excluded. Reported-by: Lihao Liang <lihao.liang@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-06-14 16:01:40 -07:00
Paul E. McKenney	0d95092ccb	rcu: Fix outdated rcu_scheduler_active comment The comment header for rcu_scheduler_active states that it is used to optimize synchronize_sched() at early boot. This is incorrect. The synchronize_sched() function instead checks the number of online CPUs. This commit therefore replaces the comment's synchronize_sched() with synchronize_rcu(), which really does use rcu_scheduler_active for this purpose. Reported-by: Lihao Liang <lihao.liang@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-06-14 16:01:39 -07:00
Paul E. McKenney	dcd36d01fb	Merge branches 'doc.2016.04.19a', 'exp.2016.03.31d', 'fixes.2016.03.31d' and 'torture.2016.04.21a' into HEAD doc.2016.04.19a: Documentation updates exp.2016.03.31d: Expedited grace-period updates fixes.2016.03.31d: Miscellaneous fixes torture.2016.004.21a Torture-test updates	2016-04-21 13:48:20 -07:00
Paul E. McKenney	291783b8ad	rcutorture: Expedited-GP batch progress access to torturing This commit provides rcu_exp_batches_completed() and rcu_exp_batches_completed_sched() functions to allow torture-test modules to check how many expedited grace period batches have completed. These are analogous to the existing rcu_batches_completed(), rcu_batches_completed_bh(), and rcu_batches_completed_sched() functions. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:37:37 -07:00
Paul E. McKenney	5dffed1e57	rcu: Dump ftrace buffer when kicking grace-period kthread If it is necessary to kick the grace-period kthread, that is a good time to dump the trace buffer in order to learn why kicking was needed. This commit therefore does the dump. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:36:37 -07:00
Paul E. McKenney	8c7c4829a8	rcu: Awaken grace-period kthread if too long since FQS Recent kernels can fail to awaken the grace-period kthread for quiescent-state forcing. This commit is a crude hack that does a wakeup if a scheduling-clock interrupt sees that it has been too long since force-quiescent-state (FQS) processing. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:34:50 -07:00
Paul E. McKenney	fcfd0a237b	rcu: Make FQS schedule advance only if FQS happened Currently, the force-quiescent-state (FQS) code in rcu_gp_kthread() can advance the next FQS even if one was not executed last time. This can happen due timeout-duration uncertainty. This commit therefore avoids advancing the FQS schedule unless an FQS was just executed. In the corner case where an FQS was not executed, but is due now, the code does a one-jiffy wait. This change prepares for kthread kicking. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:34:49 -07:00
Paul E. McKenney	86057b80ae	rcu: Awaken grace-period kthread when stalled Recent kernels can fail to awaken the grace-period kthread for quiescent-state forcing. This commit is a crude hack that does a wakeup any time a stall is detected. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:34:49 -07:00
Paul E. McKenney	3b5f668e71	rcu: Overlap wakeups with next expedited grace period The current expedited grace-period implementation makes subsequent grace periods wait on wakeups for the prior grace period. This does not fit the dictionary definition of "expedited", so this commit allows these two phases to overlap. Doing this requires four waitqueues rather than two because tasks can now be waiting on the previous, current, and next grace periods. The fourth waitqueue makes the bit masking work out nicely. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:34:11 -07:00
Paul E. McKenney	aff12cdf86	rcu: Consolidate expedited GP code into exp_funnel_lock() This commit pulls the grace-period-start counter adjustment and tracing from synchronize_rcu_expedited() and synchronize_sched_expedited() into exp_funnel_lock(), thus eliminating some code duplication. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:34:11 -07:00
Paul E. McKenney	179e5dcd1e	rcu: Consolidate expedited GP tracing into rcu_exp_gp_seq_snap() This commit moves some duplicate code from synchronize_rcu_expedited() and synchronize_sched_expedited() into rcu_exp_gp_seq_snap(). This doesn't save lines of code, but does eliminate a "tell me twice" issue. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:34:10 -07:00
Paul E. McKenney	4ea3e85b11	rcu: Consolidate expedited GP code into rcu_exp_wait_wake() Currently, synchronize_rcu_expedited() and rcu_sched_expedited() have significant duplicate code. This commit therefore consolidates some of this code into rcu_exp_wake(), which is now renamed to rcu_exp_wait_wake() in recognition of its added responsibilities. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:34:10 -07:00
Paul E. McKenney	356051e1de	rcu: Add exp_funnel_lock() fastpath This commit speeds up the low-contention case, especially for systems with large rcu_node trees, by attempting to directly acquire the ->exp_mutex. This fastpath checks the leaves and root first in order to avoid excessive memory contention on the mutex itself. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:34:09 -07:00
Paul E. McKenney	f6a12f34a4	rcu: Enforce expedited-GP fairness via funnel wait queue The current mutex-based funnel-locking approach used by expedited grace periods is subject to severe unfairness. The problem arises when a few tasks, making a path from leaves to root, all wake up before other tasks do. A new task can then follow this path all the way to the root, which needlessly delays tasks whose grace period is done, but who do not happen to acquire the lock quickly enough. This commit avoids this problem by maintaining per-rcu_node wait queues, along with a per-rcu_node counter that tracks the latest grace period sought by an earlier task to visit this node. If that grace period would satisfy the current task, instead of proceeding up the tree, it waits on the current rcu_node structure using a pair of wait queues provided for that purpose. This decouples awakening of old tasks from the arrival of new tasks. If the wakeups prove to be a bottleneck, additional kthreads can be brought to bear for that purpose. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:34:08 -07:00
Paul E. McKenney	d40a4f09a4	rcu: Shorten expedited_workdone* to exp_workdone* Just a name change to save a few lines and a bit of typing. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:34:08 -07:00
Paul E. McKenney	ec3833ed02	rcu: Force boolean subscript for expedited stall warnings The cpu_online() function can return values other than 0 and 1, which can result in subscript overflow when applied to a two-element array. This commit allows for this behavior by using "!!" on the return value from cpu_online() when used as a subscript. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:34:07 -07:00
Paul E. McKenney	e2fd9d3584	rcu: Remove expedited GP funnel-lock bypass Commit #cdacbe1f91264 ("rcu: Add fastpath bypassing funnel locking") turns out to be a pessimization at high load because it forces a tree full of tasks to wait for an expedited grace period that they probably do not need. This commit therefore removes this optimization. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:34:07 -07:00
Paul E. McKenney	4f41530245	rcu: Add expedited-grace-period event tracing Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:34:06 -07:00
Paul E. McKenney	bea2de44ae	rcu: Add funnel-locking tracing for expedited grace periods Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:34:06 -07:00
Paul E. McKenney	a1e1224849	rcu: Make cond_resched_rcu_qs() supply RCU-sched expedited QS Although cond_resched_rcu_qs() supplies quiescent states to all flavors of normal RCU grace periods, it does nothing for expedited RCU-sched grace periods. This commit therefore adds a check for a need for a quiescent state from the current CPU by an expedited RCU-sched grace period, and invokes rcu_sched_qs() to supply that quiescent state if so. Note that the check is racy in that we might be migrated to some other CPU just after checking the per-CPU variable. This is OK because the act of migration will do a context switch, which will supply the needed quiescent state. The only downside is that we might do an unnecessary call to rcu_sched_qs(), but the probability is low and the overhead is small. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:34:04 -07:00
Paul E. McKenney	251c617c75	rcu: Make expedited RCU-preempt stall warnings count accurately Currently, synchronize_sched_expedited_wait() simply sets the ndetected variable to the rcu_print_task_exp_stall() return value. This means that if the last rcu_node structure has no stalled tasks, record of any stalled tasks in previous rcu_node structures is lost, which can in turn result in failure to dump out the blocking rcu_node structures. Or could, had the test been correct. This commit therefore adds the return value of rcu_print_task_exp_stall() to ndetected and corrects the later test for ndetected. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:34:03 -07:00
Paul E. McKenney	28728dd310	rcu: Make expedited RCU-sched grace period immediately detect idle Currently, sync_sched_exp_handler() will force a reschedule unless this CPU has already checked in or unless a reschedule has already been called for. This is clearly wasteful if sync_sched_exp_handler() interrupted an idle CPU, so this commit immediately reports the quiescent state in that case. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:34:03 -07:00
Paul E. McKenney	274529ba9b	rcu: Consolidate dumping of ftrace buffer This commit consolidates a couple definitions and several calls for single-shot ftrace-buffer dumping. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-03-31 13:29:08 -07:00
Linus Torvalds	710d60cbf1	Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull cpu hotplug updates from Thomas Gleixner: "This is the first part of the ongoing cpu hotplug rework: - Initial implementation of the state machine - Runs all online and prepare down callbacks on the plugged cpu and not on some random processor - Replaces busy loop waiting with completions - Adds tracepoints so the states can be followed" More detailed commentary on this work from an earlier email: "What's wrong with the current cpu hotplug infrastructure? - Asymmetry The hotplug notifier mechanism is asymmetric versus the bringup and teardown. This is mostly caused by the notifier mechanism. - Largely undocumented dependencies While some notifiers use explicitely defined notifier priorities, we have quite some notifiers which use numerical priorities to express dependencies without any documentation why. - Control processor driven Most of the bringup/teardown of a cpu is driven by a control processor. While it is understandable, that preperatory steps, like idle thread creation, memory allocation for and initialization of essential facilities needs to be done before a cpu can boot, there is no reason why everything else must run on a control processor. Before this patch series, bringup looks like this: Control CPU Booting CPU do preparatory steps kick cpu into life do low level init sync with booting cpu sync with control cpu bring the rest up - All or nothing approach There is no way to do partial bringups. That's something which is really desired because we waste e.g. at boot substantial amount of time just busy waiting that the cpu comes to life. That's stupid as we could very well do preparatory steps and the initial IPI for other cpus and then go back and do the necessary low level synchronization with the freshly booted cpu. - Minimal debuggability Due to the notifier based design, it's impossible to switch between two stages of the bringup/teardown back and forth in order to test the correctness. So in many hotplug notifiers the cancel mechanisms are either not existant or completely untested. - Notifier [un]registering is tedious To [un]register notifiers we need to protect against hotplug at every callsite. There is no mechanism that bringup/teardown callbacks are issued on the online cpus, so every caller needs to do it itself. That also includes error rollback. What's the new design? The base of the new design is a symmetric state machine, where both the control processor and the booting/dying cpu execute a well defined set of states. Each state is symmetric in the end, except for some well defined exceptions, and the bringup/teardown can be stopped and reversed at almost all states. So the bringup of a cpu will look like this in the future: Control CPU Booting CPU do preparatory steps kick cpu into life do low level init sync with booting cpu sync with control cpu bring itself up The synchronization step does not require the control cpu to wait. That mechanism can be done asynchronously via a worker or some other mechanism. The teardown can be made very similar, so that the dying cpu cleans up and brings itself down. Cleanups which need to be done after the cpu is gone, can be scheduled asynchronously as well. There is a long way to this, as we need to refactor the notion when a cpu is available. Today we set the cpu online right after it comes out of the low level bringup, which is not really correct. The proper mechanism is to set it to available, i.e. cpu local threads, like softirqd, hotplug thread etc. can be scheduled on that cpu, and once it finished all booting steps, it's set to online, so general workloads can be scheduled on it. The reverse happens on teardown. First thing to do is to forbid scheduling of general workloads, then teardown all the per cpu resources and finally shut it off completely. This patch series implements the basic infrastructure for this at the core level. This includes the following: - Basic state machine implementation with well defined states, so ordering and prioritization can be expressed. - Interfaces to [un]register state callbacks This invokes the bringup/teardown callback on all online cpus with the proper protection in place and [un]installs the callbacks in the state machine array. For callbacks which have no particular ordering requirement we have a dynamic state space, so that drivers don't have to register an explicit hotplug state. If a callback fails, the code automatically does a rollback to the previous state. - Sysfs interface to drive the state machine to a particular step. This is only partially functional today. Full functionality and therefor testability will be achieved once we converted all existing hotplug notifiers over to the new scheme. - Run all CPU_ONLINE/DOWN_PREPARE notifiers on the booting/dying processor: Control CPU Booting CPU do preparatory steps kick cpu into life do low level init sync with booting cpu sync with control cpu wait for boot bring itself up Signal completion to control cpu In a previous step of this work we've done a full tree mechanical conversion of all hotplug notifiers to the new scheme. The balance is a net removal of about 4000 lines of code. This is not included in this series, as we decided to take a different approach. Instead of mechanically converting everything over, we will do a proper overhaul of the usage sites one by one so they nicely fit into the symmetric callback scheme. I decided to do that after I looked at the ugliness of some of the converted sites and figured out that their hotplug mechanism is completely buggered anyway. So there is no point to do a mechanical conversion first as we need to go through the usage sites one by one again in order to achieve a full symmetric and testable behaviour" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) cpu/hotplug: Document states better cpu/hotplug: Fix smpboot thread ordering cpu/hotplug: Remove redundant state check cpu/hotplug: Plug death reporting race rcu: Make CPU_DYING_IDLE an explicit call cpu/hotplug: Make wait for dead cpu completion based cpu/hotplug: Let upcoming cpu bring itself fully up arch/hotplug: Call into idle with a proper state cpu/hotplug: Move online calls to hotplugged cpu cpu/hotplug: Create hotplug threads cpu/hotplug: Split out the state walk into functions cpu/hotplug: Unpark smpboot threads from the state machine cpu/hotplug: Move scheduler cpu_online notifier to hotplug core cpu/hotplug: Implement setup/removal interface cpu/hotplug: Make target state writeable cpu/hotplug: Add sysfs state interface cpu/hotplug: Hand in target state to _cpu_up/down cpu/hotplug: Convert the hotplugged cpu work to a state machine cpu/hotplug: Convert to a state machine for the control processor cpu/hotplug: Add tracepoints ...	2016-03-15 13:50:29 -07:00
Ingo Molnar	8bc6782fe2	Merge commit 'fixes.2015.02.23a' into core/rcu Conflicts: kernel/rcu/tree.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-03-15 09:01:06 +01:00
Thomas Gleixner	27d50c7eeb	rcu: Make CPU_DYING_IDLE an explicit call Make the RCU CPU_DYING_IDLE callback an explicit function call, so it gets invoked at the proper place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Rik van Riel <riel@redhat.com> Cc: Rafael Wysocki <rafael.j.wysocki@intel.com> Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20160226182341.870167933@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-03-01 20:36:58 +01:00
Paul Gortmaker	abedf8e241	rcu: Use simple wait queues where possible in rcutree As of commit `dae6e64d2b` ("rcu: Introduce proper blocking to no-CBs kthreads GP waits") the RCU subsystem started making use of wait queues. Here we convert all additions of RCU wait queues to use simple wait queues, since they don't need the extra overhead of the full wait queue features. Originally this was done for RT kernels[1], since we would get things like... BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 in_atomic(): 1, irqs_disabled(): 1, pid: 8, name: rcu_preempt Pid: 8, comm: rcu_preempt Not tainted Call Trace: [<ffffffff8106c8d0>] __might_sleep+0xd0/0xf0 [<ffffffff817d77b4>] rt_spin_lock+0x24/0x50 [<ffffffff8106fcf6>] __wake_up+0x36/0x70 [<ffffffff810c4542>] rcu_gp_kthread+0x4d2/0x680 [<ffffffff8105f910>] ? __init_waitqueue_head+0x50/0x50 [<ffffffff810c4070>] ? rcu_gp_fqs+0x80/0x80 [<ffffffff8105eabb>] kthread+0xdb/0xe0 [<ffffffff8106b912>] ? finish_task_switch+0x52/0x100 [<ffffffff817e0754>] kernel_thread_helper+0x4/0x10 [<ffffffff8105e9e0>] ? __init_kthread_worker+0x60/0x60 [<ffffffff817e0750>] ? gs_change+0xb/0xb ...and hence simple wait queues were deployed on RT out of necessity (as simple wait uses a raw lock), but mainline might as well take advantage of the more streamline support as well. [1] This is a carry forward of work from v3.10-rt; the original conversion was by Thomas on an earlier -rt version, and Sebastian extended it to additional post-3.10 added RCU waiters; here I've added a commit log and unified the RCU changes into one, and uprev'd it to match mainline RCU. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-rt-users@vger.kernel.org Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1455871601-27484-6-git-send-email-wagi@monom.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-02-25 11:27:16 +01:00
Daniel Wagner	065bb78c5b	rcu: Do not call rcu_nocb_gp_cleanup() while holding rnp->lock rcu_nocb_gp_cleanup() is called while holding rnp->lock. Currently, this is okay because the wake_up_all() in rcu_nocb_gp_cleanup() will not enable the IRQs. lockdep is happy. By switching over using swait this is not true anymore. swake_up_all() enables the IRQs while processing the waiters. __do_softirq() can now run and will eventually call rcu_process_callbacks() which wants to grap nrp->lock. Let's move the rcu_nocb_gp_cleanup() call outside the lock before we switch over to swait. If we would hold the rnp->lock and use swait, lockdep reports following: ================================= [ INFO: inconsistent lock state ] 4.2.0-rc5-00025-g9a73ba0 #136 Not tainted --------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. rcu_preempt/8 [HC0[0]:SC0[0]:HE1:SE1] takes: (rcu_node_1){+.?...}, at: [<ffffffff811387c7>] rcu_gp_kthread+0xb97/0xeb0 {IN-SOFTIRQ-W} state was registered at: [<ffffffff81109b9f>] __lock_acquire+0xd5f/0x21e0 [<ffffffff8110be0f>] lock_acquire+0xdf/0x2b0 [<ffffffff81841cc9>] _raw_spin_lock_irqsave+0x59/0xa0 [<ffffffff81136991>] rcu_process_callbacks+0x141/0x3c0 [<ffffffff810b1a9d>] __do_softirq+0x14d/0x670 [<ffffffff810b2214>] irq_exit+0x104/0x110 [<ffffffff81844e96>] smp_apic_timer_interrupt+0x46/0x60 [<ffffffff81842e70>] apic_timer_interrupt+0x70/0x80 [<ffffffff810dba66>] rq_attach_root+0xa6/0x100 [<ffffffff810dbc2d>] cpu_attach_domain+0x16d/0x650 [<ffffffff810e4b42>] build_sched_domains+0x942/0xb00 [<ffffffff821777c2>] sched_init_smp+0x509/0x5c1 [<ffffffff821551e3>] kernel_init_freeable+0x172/0x28f [<ffffffff8182cdce>] kernel_init+0xe/0xe0 [<ffffffff8184231f>] ret_from_fork+0x3f/0x70 irq event stamp: 76 hardirqs last enabled at (75): [<ffffffff81841330>] _raw_spin_unlock_irq+0x30/0x60 hardirqs last disabled at (76): [<ffffffff8184116f>] _raw_spin_lock_irq+0x1f/0x90 softirqs last enabled at (0): [<ffffffff810a8df2>] copy_process.part.26+0x602/0x1cf0 softirqs last disabled at (0): [< (null)>] (null) other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(rcu_node_1); <Interrupt> lock(rcu_node_1); * DEADLOCK * 1 lock held by rcu_preempt/8: #0: (rcu_node_1){+.?...}, at: [<ffffffff811387c7>] rcu_gp_kthread+0xb97/0xeb0 stack backtrace: CPU: 0 PID: 8 Comm: rcu_preempt Not tainted 4.2.0-rc5-00025-g9a73ba0 #136 Hardware name: Dell Inc. PowerEdge R820/066N7P, BIOS 2.0.20 01/16/2014 0000000000000000 000000006d7e67d8 ffff881fb081fbd8 ffffffff818379e0 0000000000000000 ffff881fb0812a00 ffff881fb081fc38 ffffffff8110813b 0000000000000000 0000000000000001 ffff881f00000001 ffffffff8102fa4f Call Trace: [<ffffffff818379e0>] dump_stack+0x4f/0x7b [<ffffffff8110813b>] print_usage_bug+0x1db/0x1e0 [<ffffffff8102fa4f>] ? save_stack_trace+0x2f/0x50 [<ffffffff811087ad>] mark_lock+0x66d/0x6e0 [<ffffffff81107790>] ? check_usage_forwards+0x150/0x150 [<ffffffff81108898>] mark_held_locks+0x78/0xa0 [<ffffffff81841330>] ? _raw_spin_unlock_irq+0x30/0x60 [<ffffffff81108a28>] trace_hardirqs_on_caller+0x168/0x220 [<ffffffff81108aed>] trace_hardirqs_on+0xd/0x10 [<ffffffff81841330>] _raw_spin_unlock_irq+0x30/0x60 [<ffffffff810fd1c7>] swake_up_all+0xb7/0xe0 [<ffffffff811386e1>] rcu_gp_kthread+0xab1/0xeb0 [<ffffffff811089bf>] ? trace_hardirqs_on_caller+0xff/0x220 [<ffffffff81841341>] ? _raw_spin_unlock_irq+0x41/0x60 [<ffffffff81137c30>] ? rcu_barrier+0x20/0x20 [<ffffffff810d2014>] kthread+0x104/0x120 [<ffffffff81841330>] ? _raw_spin_unlock_irq+0x30/0x60 [<ffffffff810d1f10>] ? kthread_create_on_node+0x260/0x260 [<ffffffff8184231f>] ret_from_fork+0x3f/0x70 [<ffffffff810d1f10>] ? kthread_create_on_node+0x260/0x260 Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-rt-users@vger.kernel.org Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1455871601-27484-5-git-send-email-wagi@monom.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-02-25 11:27:16 +01:00
Paul E. McKenney	4b455dc3e1	rcu: Catch up rcu_report_qs_rdp() comment with reality Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-02-23 19:59:56 -08:00
Boqun Feng	67c583a7de	RCU: Privatize rcu_node::lock In patch: "rcu: Add transitivity to remaining rcu_node ->lock acquisitions" All locking operations on rcu_node::lock are replaced with the wrappers because of the need of transitivity, which indicates we should never write code using LOCK primitives alone(i.e. without a proper barrier following) on rcu_node::lock outside those wrappers. We could detect this kind of misuses on rcu_node::lock in the future by adding __private modifier on rcu_node::lock. To privatize rcu_node::lock, unlock wrappers are also needed. Replacing spinlock unlocks with these wrappers not only privatizes rcu_node::lock but also makes it easier to figure out critical sections of rcu_node. This patch adds __private modifier to rcu_node::lock and makes every access to it wrapped by ACCESS_PRIVATE(). Besides, unlock wrappers are added and raw_spin_unlock(&rnp->lock) and its friends are replaced with those wrappers. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-02-23 19:59:54 -08:00
Chen Gang	1914aab543	rcu: Remove useless rcu_data_p when !PREEMPT_RCU The related warning from gcc 6.0: In file included from kernel/rcu/tree.c:4630:0: kernel/rcu/tree_plugin.h:810:40: warning: ‘rcu_data_p’ defined but not used [-Wunused-const-variable] static struct rcu_data __percpu *const rcu_data_p = &rcu_sched_data; ^~~~~~~~~~ Also remove always redundant rcu_data_p in tree.c. Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-02-23 19:59:53 -08:00
Paul E. McKenney	23a9bacd35	rcu: Set rdp->gpwrap when CPU is idle Commit #e3663b1024d1 ("rcu: Handle gpnum/completed wrap while dyntick idle") sets rdp->gpwrap on the wrong side of the "if" statement in dyntick_save_progress_counter(), that is, it sets it when the CPU is not idle instead of when it is idle. Of course, if the CPU is not idle, its rdp->gpnum won't be lagging beind the global rsp->gpnum, which means that rdp->gpwrap will never be set. This commit therefore moves this code to the proper leg of that "if" statement. This change means that the "else" cause is just "return 0" and the "then" clause ends with "return 1", so also move the "return 0" to follow the "if", dropping the "else" clause. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-02-23 19:59:52 -08:00
Paul E. McKenney	4914950aaa	rcu: Stop treating in-kernel CPU-bound workloads as errors Commit `4a81e8328d` ("Reduce overhead of cond_resched() checks for RCU") handles the error case where a nohz_full loops indefinitely in the kernel with the scheduling-clock interrupt disabled. However, this handling includes IPIing the CPU running the offending loop, which is not what we want for real-time workloads. And there are starting to be real-time CPU-bound in-kernel workloads, and these must be handled without IPIing the CPU, at least not in the common case. Therefore, this situation can no longer be dismissed as an error case. This commit therefore splits the handling out, so that the setting of bits in the per-CPU rcu_sched_qs_mask variable is done relatively early, but if the problem persists, resched_cpu() is eventually used to IPI the CPU containing the offending loop. Assuming that in-kernel CPU-bound loops used by real-time tasks contain frequent calls cond_resched_rcu_qs() (as in more than once per few tens of milliseconds), the real-time tasks will never be IPIed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org>	2016-02-23 19:59:51 -08:00
Paul E. McKenney	8994515cf0	rcu: Update rcu_report_qs_rsp() comment The header comment for rcu_report_qs_rsp() was obsolete, dating well before the advent of RCU grace-period kthreads. This commit therefore brings this comment back into alignment with current reality. Reported-by: Lihao Liang <lihao.liang@cs.ox.ac.uk> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-02-23 19:59:51 -08:00
Paul E. McKenney	bb53e416e0	rcu: Assign false instead of 0 for ->core_needs_qs A zero seems to have escaped earlier true/false substitution efforts, so this commit changes 0 to false for the ->core_needs_qs boolean field. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-02-23 19:59:51 -08:00
Paul E. McKenney	648c630c64	Merge branches 'doc.2015.12.05a', 'exp.2015.12.07a', 'fixes.2015.12.07a', 'list.2015.12.04b' and 'torture.2015.12.05a' into HEAD doc.2015.12.05a: Documentation updates exp.2015.12.07a: Expedited grace-period updates fixes.2015.12.07a: Miscellaneous fixes list.2015.12.04b: Linked-list updates torture.2015.12.05a: Torture-test updates	2015-12-07 17:02:54 -08:00
Paul E. McKenney	45fed3e7cf	rcu: Make rcu_gp_init() be bool rather than int The return value from rcu_gp_init() is always used as a bool, so this commit makes it be a bool. Reported-by: Iftekhar Ahmed <ahmedi@oregonstate.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-12-07 17:01:33 -08:00
Peter Zijlstra	e11f13355b	rcu: Move wakeup out from under rnp->lock This patch removes a potential deadlock hazard by moving the wake_up_process() in rcu_spawn_gp_kthread() out from under rnp->lock. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-12-07 17:01:32 -08:00
Paul E. McKenney	7c9906ca5e	rcu: Don't redundantly disable irqs in rcu_irq_{enter,exit}() This commit replaces a local_irq_save()/local_irq_restore() pair with a lockdep assertion that interrupts are already disabled. This should remove the corresponding overhead from the interrupt entry/exit fastpaths. This change was inspired by the fact that Iftekhar Ahmed's mutation testing showed that removing rcu_irq_enter()'s call to local_ird_restore() had no effect, which might indicate that interrupts were always enabled anyway. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-12-07 17:01:31 -08:00
Paul E. McKenney	d117c8aa1d	rcu: Make cpu_needs_another_gp() be bool The cpu_needs_another_gp() function is currently of type int, but only returns zero or one. Bow to reality and make it be of type bool. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-12-07 17:01:31 -08:00
Paul E. McKenney	a87f203e27	rcu: Eliminate unused rcu_init_one() argument Now that the rcu_state structure's ->rda field is compile-time initialized, there is no need to pass the per-CPU rcu_data structure into rcu_init_one(). This commit therefore eliminates this now-unused parameter. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-12-07 17:01:19 -08:00
Paul E. McKenney	6b50e119c4	rcutorture: Print symbolic name for ->gp_state Currently, ->gp_state is printed as an integer, which slows debugging. This commit therefore prints a symbolic name in addition to the integer. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Updated to fix relational operator called out by Dan Carpenter. ] [ paulmck: More "const", as suggested by Josh Triplett. ] Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2015-12-05 17:58:26 -08:00
Paul E. McKenney	b1adb3e273	rcutorture: Dump stack when GP kthread stalls This commit increases debug information in the case where the grace-period kthread is being prevented from running by dumping that kthread's stack. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Split into prior commit and this commit, as suggested by Josh Triplett. ] Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2015-12-05 17:58:05 -08:00
Paul E. McKenney	a0e3a3aa28	rcutorture: Flag nonexistent RCU GP kthread Currently, if the RCU grace-period kthread has not yet been created, in which case the starvation-check code will print zero for the state, which maps to TASK_RUNNING. This could clearly be quite confusing, so this commit prints ~0, which does not map to any legal ->state value. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2015-12-05 17:58:00 -08:00
Paul E. McKenney	46a5d164db	rcu: Stop disabling interrupts in scheduler fastpaths We need the scheduler's fastpaths to be, well, fast, and unnecessarily disabling and re-enabling interrupts is not necessarily consistent with this goal. Especially given that there are regions of the scheduler that already have interrupts disabled. This commit therefore moves the call to rcu_note_context_switch() to one of the interrupts-disabled regions of the scheduler, and removes the now-redundant disabling and re-enabling of interrupts from rcu_note_context_switch() and the functions it calls. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Shift rcu_note_context_switch() to avoid deadlock, as suggested by Peter Zijlstra. ]	2015-12-04 12:27:31 -08:00
Paul E. McKenney	fecbf6f01f	rcu: Simplify rcu_sched_qs() control flow This commit applies an early-exit approach to rcu_sched_qs(), reducing the nesting level and saving a line of code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-12-04 12:27:29 -08:00
Paul E. McKenney	3dc5dbe9a1	rcu: Move lock_class_key to local scope Currently, the rcu_node_class[], rcu_fqs_class[], and rcu_exp_class[] arrays needlessly pollute the global namespace within tree.c. This commit therefore converts them to static local variables within rcu_init_one(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-12-04 12:27:29 -08:00
Paul E. McKenney	5a9be7c628	rcu: Add rcu_normal kernel parameter to suppress expediting Although expedited grace periods can be quite useful, and although their OS jitter has been greatly reduced, they can still pose problems for extreme real-time workloads. This commit therefore adds a rcu_normal kernel boot parameter (which can also be manipulated via sysfs) to suppress expedited grace periods, that is, to treat requests for expedited grace periods as if they were requests for normal grace periods. If both rcu_expedited and rcu_normal are specified, rcu_normal wins. This means that if you are relying on expedited grace periods to speed up boot, you will want to specify rcu_expedited on the kernel command line, and then specify rcu_normal via sysfs once boot completes. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-12-04 12:26:53 -08:00
Paul E. McKenney	72611ab9f5	rcu: Add more diagnostics to expedited stall warning messages. This commit adds print statements that check the rcu_node structure to find which ->expmask bits and which ->exp_tasks structures are blocking the current expedited grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-12-04 12:26:53 -08:00
Paul E. McKenney	73f36f9de8	rcu: Make expedited grace periods resolve stall-warning ties Currently, if a grace period ends just as the stall-warning timeout fires, an empty stall warning will be printed. This is not helpful, so this commit avoids these useless warnings by rechecking completion after awakening in synchronize_sched_expedited_wait(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-12-04 12:26:52 -08:00
Paul E. McKenney	df5bd5144a	rcu: Reduce expedited GP memory contention via per-CPU variables Currently, the piggybacked-work checks carried out by sync_exp_work_done() atomically increment a small set of variables (the ->expedited_workdone0, ->expedited_workdone1, ->expedited_workdone2, ->expedited_workdone3 fields in the rcu_state structure), which will form a memory-contention bottleneck given a sufficiently large number of CPUs concurrently invoking either synchronize_rcu_expedited() or synchronize_sched_expedited(). This commit therefore moves these for fields to the per-CPU rcu_data structure, eliminating the memory contention. The show_rcuexp() function also changes to sum up each field in the rcu_data structures. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-12-04 12:26:52 -08:00
Paul E. McKenney	1307f21487	rcu: Invert sync_rcu_exp_select_cpus() "if" statement This commit saves a couple lines of code and reduces indentation by inverting the sense of an "if" statement in the function sync_rcu_exp_select_cpus(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-12-04 12:26:51 -08:00
Paul E. McKenney	886ef5a18a	rcu: Move smp_mb() from rcu_seq_snap() to rcu_exp_gp_seq_snap() The memory barrier in rcu_seq_snap() is needed only for grace periods, so this commit moves it to the grace-period-oriented wrapper rcu_exp_gp_seq_snap(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-12-04 12:26:51 -08:00
Paul E. McKenney	06f60de19d	rcu: Short-circuit synchronize_sched_expedited() if only one CPU If there is only one CPU, then invoking synchronize_sched_expedited() is by definition a grace period. This commit checks for this condition and does a short-circuit return in that case. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-12-04 12:26:50 -08:00
Paul E. McKenney	6cf1008122	rcu: Add transitivity to remaining rcu_node ->lock acquisitions The rule is that all acquisitions of the rcu_node structure's ->lock must provide transitivity: The lock is not acquired that frequently, and sorting out exactly which required it and which did not would be a maintenance nightmare. This commit therefore supplies the needed transitivity to the remaining ->lock acquisitions. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-11-23 10:37:35 -08:00
Peter Zijlstra	2a67e741bb	rcu: Create transitive rnp->lock acquisition functions Providing RCU's memory-ordering guarantees requires that the rcu_node tree's locking provide transitive memory ordering, which the Linux kernel's spinlocks currently do not provide unless smp_mb__after_unlock_lock() is used. Having a separate smp_mb__after_unlock_lock() after each and every lock acquisition is error-prone, hard to read, and a bit annoying, so this commit provides wrapper functions that pull in the smp_mb__after_unlock_lock() invocations. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-11-23 10:37:35 -08:00
Paul E. McKenney	d2856b046d	Merge branches 'fixes.2015.10.06a' and 'exp.2015.10.07a' into HEAD exp.2015.10.07a: Reduce OS jitter of RCU-sched expedited grace periods. fixes.2015.10.06a: Miscellaneous fixes.	2015-10-07 16:05:21 -07:00
Paul E. McKenney	338b0f760e	rcu: Better hotplug handling for synchronize_sched_expedited() Earlier versions of synchronize_sched_expedited() can prematurely end grace periods due to the fact that a CPU marked as cpu_is_offline() can still be using RCU read-side critical sections during the time that CPU makes its last pass through the scheduler and into the idle loop and during the time that a given CPU is in the process of coming online. This commit therefore eliminates this window by adding additional interaction with the CPU-hotplug operations. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-10-07 16:02:50 -07:00
Paul E. McKenney	c58656382e	rcu: Add tasks to expedited stall-warning messages This commit adds task-print ability to the expedited RCU CPU stall warning messages in preparation for adding stall warnings to synchornize_rcu_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-10-07 16:02:50 -07:00
Paul E. McKenney	74611ecb0f	rcu: Add online/offline info to expedited stall warning message This commit makes the RCU CPU stall warning message print online/offline indications immediately after the CPU number. A "O" indicates global offline, a "." global online, and a "o" indicates RCU believes that the CPU is offline for the current grace period and "." otherwise, and an "N" indicates that RCU believes that the CPU will be offline for the next grace period, and "." otherwise, all right after the CPU number. So for CPU 10, you would normally see "10-...:" indicating that everything believes that the CPU is online. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-10-07 16:02:50 -07:00
Paul E. McKenney	dcdb8807ba	rcu: Consolidate expedited CPU selection Now that sync_sched_exp_select_cpus() and sync_rcu_exp_select_cpus() are identical aside from the the argument to smp_call_function_single(), this commit consolidates them with a functional argument. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-10-07 16:02:50 -07:00
Paul E. McKenney	66fe6cbee4	rcu: Prepare for consolidating expedited CPU selection This commit brings sync_sched_exp_select_cpus() into alignment with sync_rcu_exp_select_cpus(), as a first step towards consolidating them into one function. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-10-07 16:02:50 -07:00
Paul E. McKenney	807226e2fb	rcu: Stop excluding CPU hotplug in synchronize_sched_expedited() Now that synchronize_sched_expedited() uses IPIs, a hook in rcu_sched_qs(), and the ->expmask field in the rcu_node combining tree, it is no longer necessary to exclude CPU hotplug. Any races with CPU hotplug will be detected when attempting to send the IPI. This commit therefore removes the code excluding CPU hotplug operations. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-10-07 16:02:49 -07:00
Paul E. McKenney	83c2c735e7	rcu: Stop silencing lockdep false positive for expedited grace periods This reverts commit `af859beaab` (rcu: Silence lockdep false positive for expedited grace periods). Because synchronize_rcu_expedited() no longer invokes synchronize_sched_expedited(), ->exp_funnel_mutex acquisition is no longer nested, so the false positive no longer happens. This commit therefore removes the extra lockdep data structures, as they are no longer needed.	2015-10-07 16:02:49 -07:00
Paul E. McKenney	6587a23b6b	rcu: Switch synchronize_sched_expedited() to IPI This commit switches synchronize_sched_expedited() from stop_one_cpu_nowait() to smp_call_function_single(), thus moving from an IPI and a pair of context switches to an IPI and a single pass through the scheduler. Of course, if the scheduler actually does decide to switch to a different task, there will still be a pair of context switches, but there would likely have been a pair of context switches anyway, just a bit later. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-10-07 16:01:12 -07:00
Petr Mladek	77f81fe08e	rcu: Finish folding ->fqs_state into ->gp_state Commit commit `4cdfc175c2` ("rcu: Move quiescent-state forcing into kthread") started the process of folding the old ->fqs_state into ->gp_state, but did not complete it. This situation does not cause any malfunction, but can result in extremely confusing trace output. This commit completes this task of eliminating ->fqs_state in favor of ->gp_state. The old ->fqs_state was also used to decide when to collect dyntick-idle snapshots. For this purpose, we add a boolean variable into the kthread, which is set on the first call to rcu_gp_fqs() for a given grace period and clear otherwise. Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2015-10-06 11:15:59 -07:00
Paul E. McKenney	ee968ac61d	rcu: Eliminate panic when silly boot-time fanout specified This commit loosens rcutree.rcu_fanout_leaf range checks and replaces a panic() with a fallback to compile-time values. This fallback is accompanied by a WARN_ON(), and both occur when the rcutree.rcu_fanout_leaf value is too small to accommodate the number of CPUs. For example, given the current four-level limit for the rcu_node tree, a system with more than 16 CPUs built with CONFIG_FANOUT=2 must have rcutree.rcu_fanout_leaf larger than 2. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2015-10-06 11:09:41 -07:00
Boqun Feng	bb73c52bad	rcu: Don't disable preemption for Tiny and Tree RCU readers Because preempt_disable() maps to barrier() for non-debug builds, it forces the compiler to spill and reload registers. Because Tree RCU and Tiny RCU now only appear in CONFIG_PREEMPT=n builds, these barrier() instances generate needless extra code for each instance of rcu_read_lock() and rcu_read_unlock(). This extra code slows down Tree RCU and bloats Tiny RCU. This commit therefore removes the preempt_disable() and preempt_enable() from the non-preemptible implementations of __rcu_read_lock() and __rcu_read_unlock(), respectively. However, for debug purposes, preempt_disable() and preempt_enable() are still invoked if CONFIG_PREEMPT_COUNT=y, because this allows detection of sleeping inside atomic sections in non-preemptible kernels. However, Tiny and Tree RCU operates by coalescing all RCU read-side critical sections on a given CPU that lie between successive quiescent states. It is therefore necessary to compensate for removing barriers from __rcu_read_lock() and __rcu_read_unlock() by adding them to a couple of the RCU functions invoked during quiescent states, namely to rcu_all_qs() and rcu_note_context_switch(). However, note that the latter is more paranoia than necessity, at least until link-time optimizations become more aggressive. This is based on an earlier patch by Paul E. McKenney, fixing a bug encountered in kernels built with CONFIG_PREEMPT=n and CONFIG_PREEMPT_COUNT=y. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-10-06 11:08:23 -07:00
Boqun Feng	b6a4ae766e	rcu: Use rcu_callback_t in call_rcu() and friends As we now have rcu_callback_t typedefs as the type of rcu callbacks, we should use it in call_rcu() and friends as the type of parameters. This could save us a few lines of code and make it clear which function requires an rcu callbacks rather than other callbacks as its argument. Besides, this can also help cscope to generate a better database for code reading. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2015-10-06 11:08:05 -07:00
Paul E. McKenney	5b74c45890	rcu: Make ->cpu_no_qs be a union for aggregate OR This commit converts the rcu_data structure's ->cpu_no_qs field to a union. The bytewise side of this union allows individual access to indications as to whether this CPU needs to find a quiescent state for a normal (.norm) and/or expedited (.exp) grace period. The setwise side of the union allows testing whether or not a quiescent state is needed at all, for either type of grace period. For now, only .norm is used. A later commit will introduce the expedited usage. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-09-20 21:16:21 -07:00
Paul E. McKenney	0d43eb34f9	rcu: Invert passed_quiesce and rename to cpu_no_qs This commit inverts the sense of the rcu_data structure's ->passed_quiesce field and renames it to ->cpu_no_qs. This will allow a later commit to use an "aggregate OR" operation to test expedited as well as normal grace periods without added overhead. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-09-20 21:16:21 -07:00
Paul E. McKenney	97c668b8e9	rcu: Rename qs_pending to core_needs_qs An upcoming commit needs to invert the sense of the ->passed_quiesce rcu_data structure field, so this commit is taking this opportunity to clarify things a bit by renaming ->qs_pending to ->core_needs_qs. So if !rdp->core_needs_qs, then this CPU need not concern itself with quiescent states, in particular, it need not acquire its leaf rcu_node structure's ->lock to check. Otherwise, it needs to report the next quiescent state. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-09-20 21:16:20 -07:00
Paul E. McKenney	bce5fa12aa	rcu: Move synchronize_sched_expedited() to combining tree Currently, synchronize_sched_expedited() uses a single global counter to track the number of remaining context switches that the current expedited grace period must wait on. This is problematic on large systems, where the resulting memory contention can be pathological. This commit therefore makes synchronize_sched_expedited() instead use the combining tree in the same manner as synchronize_rcu_expedited(), keeping memory contention down to a dull roar. This commit creates a temporary function sync_sched_exp_select_cpus() that is very similar to sync_rcu_exp_select_cpus(). A later commit will consolidate these two functions, which becomes possible when synchronize_sched_expedited() switches from stop_one_cpu_nowait() to smp_call_function_single(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-09-20 21:16:20 -07:00
Paul E. McKenney	8203d6d0ee	rcu: Use single-stage IPI algorithm for RCU expedited grace period The current preemptible-RCU expedited grace-period algorithm invokes synchronize_sched_expedited() to enqueue all tasks currently running in a preemptible-RCU read-side critical section, then waits for all the ->blkd_tasks lists to drain. This works, but results in both an IPI and a double context switch even on CPUs that do not happen to be running in a preemptible RCU read-side critical section. This commit implements a new algorithm that causes less OS jitter. This new algorithm IPIs all online CPUs that are not idle (from an RCU perspective), but refrains from self-IPIs. If a CPU receiving this IPI is not in a preemptible RCU read-side critical section (or is just now exiting one), it pushes quiescence up the rcu_node tree, otherwise, it sets a flag that will be handled by the upcoming outermost rcu_read_unlock(), which will then push quiescence up the tree. The expedited grace period must of course wait on any pre-existing blocked readers, and newly blocked readers must be queued carefully based on the state of both the normal and the expedited grace periods. This new queueing approach also avoids the need to update boost state, courtesy of the fact that blocked tasks are no longer ever migrated to the root rcu_node structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-09-20 21:16:19 -07:00
Paul E. McKenney	b9585e940a	rcu: Consolidate tree setup for synchronize_rcu_expedited() This commit replaces sync_rcu_preempt_exp_init1(() and sync_rcu_preempt_exp_init2() with sync_exp_reset_tree_hotplug() and sync_exp_reset_tree(), which will also be used by synchronize_sched_expedited(), and sync_rcu_exp_select_nodes(), which contains code specific to synchronize_rcu_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-09-20 21:16:18 -07:00
Paul E. McKenney	7922cd0e56	rcu: Move rcu_report_exp_rnp() to allow consolidation This is a nearly pure code-movement commit, moving rcu_report_exp_rnp(), sync_rcu_preempt_exp_done(), and rcu_preempted_readers_exp() so that later commits can make synchronize_sched_expedited() use them. The non-code-movement portion of this commit tags rcu_report_exp_rnp() as __maybe_unused to avoid build errors when CONFIG_PREEMPT=n. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-09-20 21:16:18 -07:00
Paul E. McKenney	f4ecea309d	rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq Now that there is an ->expedited_wq waitqueue in each rcu_state structure, there is no need for the sync_rcu_preempt_exp_wq global variable. This commit therefore substitutes ->expedited_wq for sync_rcu_preempt_exp_wq. It also initializes ->expedited_wq only once at boot instead of at the start of each expedited grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-09-20 21:16:17 -07:00
Paul E. McKenney	19a5ecde08	rcu: Suppress lockdep false positive for rcp->exp_funnel_mutex In kernels built with CONFIG_PREEMPT=y, synchronize_rcu_expedited() invokes synchronize_sched_expedited() while holding RCU-preempt's root rcu_node structure's ->exp_funnel_mutex, which is acquired after the rcu_data structure's ->exp_funnel_mutex. The first thing that synchronize_sched_expedited() will do is acquire RCU-sched's rcu_data structure's ->exp_funnel_mutex. There is no danger of an actual deadlock because the locking order is always from RCU-preempt's expedited mutexes to those of RCU-sched. Unfortunately, lockdep considers both rcu_data structures' ->exp_funnel_mutex to be in the same lock class and therefore reports a deadlock cycle. This commit silences this false positive by placing RCU-sched's rcu_data structures' ->exp_funnel_mutex locks into their own lock class. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-09-20 21:01:22 -07:00
Paul E. McKenney	8ff4fbfd69	Merge branches 'fixes.2015.07.22a' and 'initexp.2015.08.04a' into HEAD fixes.2015.07.22a: Miscellaneous fixes. initexp.2015.08.04a: Initialization and expedited updates. (Single branch due to conflicts.)	2015-08-04 08:40:58 -07:00
Paul E. McKenney	af859beaab	rcu: Silence lockdep false positive for expedited grace periods In a CONFIG_PREEMPT=y kernel, synchronize_rcu_expedited() acquires the ->exp_funnel_mutex in rcu_preempt_state, then invokes synchronize_sched_expedited, which acquires the ->exp_funnel_mutex in rcu_sched_state. There can be no deadlock because rcu_preempt_state ->exp_funnel_mutex acquisition always precedes that of rcu_sched_state. But lockdep does not know that, so it gives false-positive splats. This commit therefore associates a separate lock_class_key structure with the rcu_sched_state structure's ->exp_funnel_mutex, allowing lockdep to see the lock ordering, avoiding the false positives. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-08-04 08:39:21 -07:00
Paul E. McKenney	f78f5b90c4	rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN() This commit renames rcu_lockdep_assert() to RCU_LOCKDEP_WARN() for consistency with the WARN() series of macros. This also requires inverting the sense of the conditional, which this commit also does. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org>	2015-07-22 15:27:32 -07:00
Alexei Starovoitov	46f00d18fc	rcu: Make rcu_is_watching() really notrace Although rcu_is_watching() is marked notrace, it invokes preempt_disable() and preempt_enable(), both of which can be traced. This defeats the purpose of the notrace on rcu_is_watching(), so this commit substitutes preempt_disable_notrace() and preempt_enable_notrace(). Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org>	2015-07-22 15:27:31 -07:00
Paul E. McKenney	24560056de	rcu: Add RCU-sched flavors of get-state and cond-sync The get_state_synchronize_rcu() and cond_synchronize_rcu() functions allow polling for grace-period completion, with an actual wait for a grace period occurring only when cond_synchronize_rcu() is called too soon after the corresponding get_state_synchronize_rcu(). However, these functions work only for vanilla RCU. This commit adds the get_state_synchronize_sched() and cond_synchronize_sched(), which provide the same capability for RCU-sched. Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-22 15:26:58 -07:00
Paul E. McKenney	cdacbe1f91	rcu: Add fastpath bypassing funnel locking In the common case, there will be only one expedited grace period in the system at a given time, in which case it is not helpful to use funnel locking. This commit therefore adds a fastpath that bypasses funnel locking when the root ->exp_funnel_mutex is not held. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-17 14:59:06 -07:00
Paul E. McKenney	32bb1c7999	rcu: Rename RCU_GP_DONE_FQS to RCU_GP_DOING_FQS The grace-period kthread sleeps waiting to do a force-quiescent-state scan, and when awakened sets rsp->gp_state to RCU_GP_DONE_FQS. However, this is confusing because the kthread has not done the force-quiescent-state, but is instead just starting to do it. This commit therefore renames RCU_GP_DONE_FQS to RCU_GP_DOING_FQS in order to make things a bit easier on reviewers. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-17 14:59:05 -07:00
Paul E. McKenney	b9a425cfcb	rcu: Pull out wait_event*() condition into helper function The condition for the wait_event_interruptible_timeout() that waits to do the next force-quiescent-state scan is a bit ornate: ((gf = READ_ONCE(rsp->gp_flags)) & RCU_GP_FLAG_FQS) \|\| (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers_cgp(rnp)) This commit therefore pulls this condition out into a helper function and comments its component conditions. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-17 14:59:04 -07:00
Paul E. McKenney	cf3620a6c7	rcu: Add stall warnings to synchronize_sched_expedited() Although synchronize_sched_expedited() historically has no RCU CPU stall warnings, the availability of the rcupdate.rcu_expedited boot parameter invalidates the old assumption that synchronize_sched()'s stall warnings would suffice. This commit therefore adds RCU CPU stall warnings to synchronize_sched_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-17 14:59:01 -07:00
Paul E. McKenney	2cd6ffafec	rcu: Extend expedited funnel locking to rcu_data structure The strictly rcu_node based funnel-locking scheme works well in many cases, but systems with CONFIG_RCU_FANOUT_LEAF=64 won't necessarily get all that much concurrency. This commit therefore extends the funnel locking into the per-CPU rcu_data structure, providing concurrency equal to the number of CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-17 14:59:00 -07:00
Paul E. McKenney	704dd435ac	rcu: Consolidate last open-coded expedited memory barrier One of the requirements on RCU grace periods is that if there is a causal chain of operations that starts after one grace period and ends before another grace period, then the two grace periods must be serialized. There has been (and might still be) code that relies on this, for example, certain types of reference-counting code that does a call_rcu() within an RCU callback function. This requirement is why there is an smp_mb() at the end of both synchronize_sched_expedited() and synchronize_rcu_expedited(). However, this is the only smp_mb() in these functions, so it would be nicer to consolidate it into rcu_exp_gp_seq_end(). This commit does just that. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-17 14:58:59 -07:00
Paul E. McKenney	4f525a528b	rcu: Apply rcu_seq operations to _rcu_barrier() The rcu_seq operations were open-coded in _rcu_barrier(), so this commit replaces the open-coding with the shiny new rcu_seq operations. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-17 14:58:57 -07:00
Paul E. McKenney	29fd930940	rcu: Use funnel locking for synchronize_rcu_expedited()'s polling loop This commit gets rid of synchronize_rcu_expedited()'s mutex_trylock() polling loop in favor of the funnel-locking scheme that was abstracted from synchronize_sched_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-17 14:58:56 -07:00
Paul E. McKenney	7fd0ddc5bf	rcu: Fix synchronize_sched_expedited() type error for "s" The type of "s" has been "long" rather than the correct "unsigned long" for quite some time. This commit fixes this type error. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-17 14:58:55 -07:00
Paul E. McKenney	b09e5f8601	rcu: Abstract funnel locking from synchronize_sched_expedited() This commit abstracts funnel locking from synchronize_sched_expedited() so that it may be used by synchronize_rcu_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-17 14:58:53 -07:00
Paul E. McKenney	28f00767e3	rcu: Abstract sequence counting from synchronize_sched_expedited() This commit creates rcu_exp_gp_seq_start() and rcu_exp_gp_seq_end() to bracket an expedited grace period, rcu_exp_gp_seq_snap() to snapshot the sequence counter, and rcu_exp_gp_seq_done() to check to see if a full expedited grace period has elapsed since the snapshot. These will be applied to synchronize_rcu_expedited(). These are defined in terms of underlying rcu_seq_start(), rcu_seq_end(), rcu_seq_snap(), rcu_seq_done(), which will be applied to _rcu_barrier(). One reason that this commit doesn't use the seqcount primitives themselves is that the smp_wmb() in those primitive is insufficient due to the fact that expedited grace periods do reads as well as writes. In addition, the read-side seqcount primitives detect a potentially partial change, where the expedited primitives instead need a guaranteed full change. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-17 14:58:51 -07:00
Peter Zijlstra	3a6d7c64d7	rcu: Make expedited GP CPU stoppage asynchronous Sequentially stopping the CPUs slows down expedited grace periods by at least a factor of two, based on rcutorture's grace-period-per-second rate. This is a conservative measure because rcutorture uses unusually long RCU read-side critical sections and because rcutorture periodically quiesces the system in order to test RCU's ability to ramp down to and up from the idle state. This commit therefore replaces the stop_one_cpu() with stop_one_cpu_nowait(), using an atomic-counter scheme to determine when all CPUs have passed through the stopped state. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-17 14:58:50 -07:00
Paul E. McKenney	385b73c06f	rcu: Get rid of synchronize_sched_expedited()'s polling loop This commit gets rid of synchronize_sched_expedited()'s mutex_trylock() polling loop in favor of a funnel-locking scheme based on the rcu_node tree. The work-done check is done at each level of the tree, allowing high-contention situations to be resolved quickly with reasonable levels of mutex contention. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-17 14:58:48 -07:00
Paul E. McKenney	d6ada2cf2f	rcu: Rework synchronize_sched_expedited() counter handling Now that synchronize_sched_expedited() have a mutex, it can use simpler work-already-done detection scheme. This commit simplifies this scheme by using something similar to the sequence-locking counter scheme. A counter is incremented before and after each grace period, so that the counter is odd in the midst of the grace period and even otherwise. So if the counter has advanced to the second even number that is greater than or equal to the snapshot, the required grace period has already happened. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-17 14:58:47 -07:00
Peter Zijlstra	c190c3b16c	rcu: Switch synchronize_sched_expedited() to stop_one_cpu() The synchronize_sched_expedited() currently invokes try_stop_cpus(), which schedules the stopper kthreads on each online non-idle CPU, and waits until all those kthreads are running before letting any of them stop. This is disastrous for real-time workloads, which get hit with a preemption that is as long as the longest scheduling latency on any CPU, including any non-realtime housekeeping CPUs. This commit therefore switches to using stop_one_cpu() on each CPU in turn. This avoids inflicting the worst-case scheduling latency on the worst-case CPU onto all other CPUs, and also simplifies the code a little bit. Follow-up commits will simplify the counter-snapshotting algorithm and convert a number of the counters that are now protected by the new ->expedited_mutex to non-atomic. Signed-off-by: Peter Zijlstra <peterz@infradead.org> [ paulmck: Kept stop_one_cpu(), dropped disabling of "guardrails". ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-17 14:58:45 -07:00
Paul E. McKenney	13bd64947f	rcu: Reset rcu_fanout_leaf if out of bounds Currently if the rcu_fanout_leaf boot parameter is out of bounds (that is, less than RCU_FANOUT_LEAF or greater than the number of bits in an unsigned long), a warning is issued and execution continues with the out-of-bounds value. This can result in all manner of failures, so this patch resets rcu_fanout_leaf to RCU_FANOUT_LEAF when an out-of-bounds condition is detected. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-17 14:58:41 -07:00
Alexander Gordeev	cb00710239	rcu: Limit count of static data to the number of RCU levels Although a number of RCU levels may be less than the current maximum of four, some static data associated with each level are allocated for all four levels. As result, the extra data never get accessed and just wast memory. This update limits count of allocated items to the number of used RCU levels. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-15 14:45:20 -07:00
Alexander Gordeev	199977bff9	rcu: Remove unnecessary fields from rcu_state structure Members rcu_state::levelcnt[] and rcu_state::levelspread[] are only used at init. There is no reason to keep them afterwards. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-15 14:45:19 -07:00
Alexander Gordeev	05b84aec46	rcu: Limit rcu_capacity[] size to RCU_NUM_LVLS items Number of items in rcu_capacity[] array is defined by macro MAX_RCU_LVLS. However, that array is never accessed beyond RCU_NUM_LVLS index. Therefore, we can limit the array to RCU_NUM_LVLS items and eliminate MAX_RCU_LVLS. As result, in most cases the memory is conserved. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-15 14:45:18 -07:00
Alexander Gordeev	9618138b09	rcu: Simplify rcu_init_geometry() capacity arithmetics Current code suggests that introducing the extra level to rcu_capacity[] array makes some of the arithmetic easier. Well, in fact it appears rather confusing and unnecessary. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-15 14:45:15 -07:00
Alexander Gordeev	679f9858b1	rcu: Cleanup rcu_init_geometry() code and arithmetics This update simplifies rcu_init_geometry() code flow and makes calculation of the total number of rcu_node structures more easy to read. The update relies on the fact num_rcu_lvl[] is never accessed beyond rcu_num_lvls index by the rest of the code. Therefore, there is no need initialize the whole num_rcu_lvl[]. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-15 14:45:14 -07:00
Alexander Gordeev	372b0ec24f	rcu: Remove superfluous local variable in rcu_init_geometry() Local variable 'n' mimics 'nr_cpu_ids' while the both are used within one function. There is no reason for 'n' to exist whatsoever. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-15 14:45:13 -07:00
Alexander Gordeev	75cf15a4c0	rcu: Panic if RCU tree can not accommodate all CPUs Currently a condition when RCU tree is unable to accommodate the configured number of CPUs is not permitted and causes a fall back to compile-time values. However, the code has no means to exceed the RCU tree capacity neither at compile-time nor in run-time. Therefore, if the condition is met in run- time then it indicates a serios problem elsewhere and should be handled with a panic. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-15 14:45:12 -07:00
Paul E. McKenney	319362c90f	rcu: Provide more diagnostics for stalled GP kthread Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-07-15 14:45:10 -07:00
Paul E. McKenney	d1ec4c34c7	rcu: Drop RCU_USER_QS in favor of NO_HZ_FULL The RCU_USER_QS Kconfig parameter is now just a synonym for NO_HZ_FULL, so this commit eliminates RCU_USER_QS, replacing all uses with NO_HZ_FULL. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com>	2015-07-06 13:52:18 -07:00
Linus Torvalds	e382608254	This patch series contains several clean ups and even a new trace clock "monitonic raw". Also some enhancements to make the ring buffer even faster. But the biggest and most noticeable change is the renaming of the ftrace* files, structures and variables that have to deal with trace events. Over the years I've had several developers tell me about their confusion with what ftrace is compared to events. Technically, "ftrace" is the infrastructure to do the function hooks, which include tracing and also helps with live kernel patching. But the trace events are a separate entity altogether, and the files that affect the trace events should not be named "ftrace". These include: include/trace/ftrace.h -> include/trace/trace_events.h include/linux/ftrace_event.h -> include/linux/trace_events.h Also, functions that are specific for trace events have also been renamed: ftrace_print_() -> trace_print_() (un)register_ftrace_event() -> (un)register_trace_event() ftrace_event_name() -> trace_event_name() ftrace_trigger_soft_disabled()-> trace_trigger_soft_disabled() ftrace_define_fields_##call() -> trace_define_fields_##call() ftrace_get_offsets_##call() -> trace_get_offsets_##call() Structures have been renamed: ftrace_event_file -> trace_event_file ftrace_event_{call,class} -> trace_event_{call,class} ftrace_event_buffer -> trace_event_buffer ftrace_subsystem_dir -> trace_subsystem_dir ftrace_event_raw_##call -> trace_event_raw_##call ftrace_event_data_offset_##call-> trace_event_data_offset_##call ftrace_event_type_funcs_##call -> trace_event_type_funcs_##call And a few various variables and flags have also been updated. This has been sitting in linux-next for some time, and I have not heard a single complaint about this rename breaking anything. Mostly because these functions, variables and structures are mostly internal to the tracing system and are seldom (if ever) used by anything external to that. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJViYhVAAoJEEjnJuOKh9ldcJ0IAI+mytwoMAN/CWDE8pXrTrgs aHlcr1zorSzZ0Lq6lKsWP+V0VGVhP8KWO16vl35HaM5ZB9U+cDzWiGobI8JTHi/3 eeTAPTjQdgrr/L+ZO1ApzS1jYPhN3Xi5L7xublcYMJjKfzU+bcYXg/x8gRt0QbG3 S9QN/kBt0JIIjT7McN64m5JVk2OiU36LxXxwHgCqJvVCPHUrriAdIX7Z5KRpEv13 zxgCN4d7Jiec/FsMW8dkO0vRlVAvudZWLL7oDmdsvNhnLy8nE79UOeHos2c1qifQ LV4DeQ+2Hlu7w9wxixHuoOgNXDUEiQPJXzPc/CuCahiTL9N/urQSGQDoOVMltR4= =hkdz -----END PGP SIGNATURE----- Merge tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This patch series contains several clean ups and even a new trace clock "monitonic raw". Also some enhancements to make the ring buffer even faster. But the biggest and most noticeable change is the renaming of the ftrace* files, structures and variables that have to deal with trace events. Over the years I've had several developers tell me about their confusion with what ftrace is compared to events. Technically, "ftrace" is the infrastructure to do the function hooks, which include tracing and also helps with live kernel patching. But the trace events are a separate entity altogether, and the files that affect the trace events should not be named "ftrace". These include: include/trace/ftrace.h -> include/trace/trace_events.h include/linux/ftrace_event.h -> include/linux/trace_events.h Also, functions that are specific for trace events have also been renamed: ftrace_print_() -> trace_print_() (un)register_ftrace_event() -> (un)register_trace_event() ftrace_event_name() -> trace_event_name() ftrace_trigger_soft_disabled() -> trace_trigger_soft_disabled() ftrace_define_fields_##call() -> trace_define_fields_##call() ftrace_get_offsets_##call() -> trace_get_offsets_##call() Structures have been renamed: ftrace_event_file -> trace_event_file ftrace_event_{call,class} -> trace_event_{call,class} ftrace_event_buffer -> trace_event_buffer ftrace_subsystem_dir -> trace_subsystem_dir ftrace_event_raw_##call -> trace_event_raw_##call ftrace_event_data_offset_##call-> trace_event_data_offset_##call ftrace_event_type_funcs_##call -> trace_event_type_funcs_##call And a few various variables and flags have also been updated. This has been sitting in linux-next for some time, and I have not heard a single complaint about this rename breaking anything. Mostly because these functions, variables and structures are mostly internal to the tracing system and are seldom (if ever) used by anything external to that" * tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits) ring_buffer: Allow to exit the ring buffer benchmark immediately ring-buffer-benchmark: Fix the wrong type ring-buffer-benchmark: Fix the wrong param in module_param ring-buffer: Add enum names for the context levels ring-buffer: Remove useless unused tracing_off_permanent() ring-buffer: Give NMIs a chance to lock the reader_lock ring-buffer: Add trace_recursive checks to ring_buffer_write() ring-buffer: Allways do the trace_recursive checks ring-buffer: Move recursive check to per_cpu descriptor ring-buffer: Add unlikelys to make fast path the default tracing: Rename ftrace_get_offsets_##call() to trace_event_get_offsets_##call() tracing: Rename ftrace_define_fields_##call() to trace_event_define_fields_##call() tracing: Rename ftrace_event_type_funcs_##call to trace_event_type_funcs_##call tracing: Rename ftrace_data_offset_##call to trace_event_data_offset_##call tracing: Rename ftrace_raw_##call event structures to trace_event_raw_##call tracing: Rename ftrace_trigger_soft_disabled() to trace_trigger_soft_disabled() tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_* tracing: Rename struct ftrace_subsystem_dir to trace_subsystem_dir tracing: Rename ftrace_event_name() to trace_event_name() tracing: Rename FTRACE_MAX_EVENT to TRACE_EVENT_TYPE_MAX ...	2015-06-26 14:02:43 -07:00
Paul E. McKenney	0868aa2216	Merge branches 'array.2015.05.27a', 'doc.2015.05.27a', 'fixes.2015.05.27a', 'hotplug.2015.05.27a', 'init.2015.05.27a', 'tiny.2015.05.27a' and 'torture.2015.05.27a' into HEAD array.2015.05.27a: Remove all uses of RCU-protected array indexes. doc.2015.05.27a: Docuemntation updates. fixes.2015.05.27a: Miscellaneous fixes. hotplug.2015.05.27a: CPU-hotplug updates. init.2015.05.27a: Initialization/Kconfig updates. tiny.2015.05.27a: Updates to Tiny RCU. torture.2015.05.27a: Torture-testing updates.	2015-05-27 13:00:49 -07:00
Paul E. McKenney	1ce46ee597	rcu: Conditionally compile RCU's eqs warnings This commit applies some warning-omission micro-optimizations to RCU's various extended-quiescent-state functions, which are on the kernel/user hotpath for CONFIG_NO_HZ_FULL=y. Reported-by: Rik van Riel <riel@redhat.com> Reported by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-05-27 12:59:07 -07:00
Paul E. McKenney	26730f55c2	rcu: Make RCU able to tolerate undefined CONFIG_RCU_KTHREAD_PRIO This commit updates the initialization of the kthread_prio boot parameter so that RCU will build even when CONFIG_RCU_KTHREAD_PRIO is undefined. The kthread_prio boot parameter is set to CONFIG_RCU_KTHREAD_PRIO if that is defined, otherwise to 1 if CONFIG_RCU_BOOST is defined and to zero otherwise. This commit then makes CONFIG_RCU_KTHREAD_PRIO depend on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about CONFIG_RCU_KTHREAD_PRIO unless they want to be. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2015-05-27 12:59:06 -07:00
Paul E. McKenney	47d631af58	rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT_LEAF This commit introduces an RCU_FANOUT_LEAF C-preprocessor macro so that RCU will build even when CONFIG_RCU_FANOUT_LEAF is undefined. The RCU_FANOUT_LEAF macro is set to the value of CONFIG_RCU_FANOUT_LEAF when defined, otherwise it is set to 32 for 32-bit systems and 64 for 64-bit systems. This commit then makes CONFIG_RCU_FANOUT_LEAF depend on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about CONFIG_RCU_FANOUT_LEAF unless they want to be. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2015-05-27 12:59:05 -07:00
Paul E. McKenney	05c5df31af	rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT This commit introduces an RCU_FANOUT C-preprocessor macro so that RCU will build even when CONFIG_RCU_FANOUT is undefined. The RCU_FANOUT macro is set to the value of CONFIG_RCU_FANOUT when defined, otherwise it is set to 32 for 32-bit systems and 64 for 64-bit systems. This commit then makes CONFIG_RCU_FANOUT depend on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about CONFIG_RCU_FANOUT unless they want to be. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2015-05-27 12:59:05 -07:00
Paul E. McKenney	a3dc2948ce	rcu: Enable diagnostic dump of rcu_node combining tree The purpose of this commit is to make it easier to verify that RCU's combining tree is set up correctly, which is useful to have when making changes in how that tree is initialized. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com> [ paulmck: Fold fix found by Fengguang's 0-day test robot. ]	2015-05-27 12:59:04 -07:00
Paul E. McKenney	7fa270010e	rcu: Convert CONFIG_RCU_FANOUT_EXACT to boot parameter The CONFIG_RCU_FANOUT_EXACT Kconfig parameter is used primarily (and perhaps only) by rcutorture to verify that RCU works correctly in specific rcu_node combining-tree configurations. It therefore does not make much sense have this as a question to people attempting to configure their kernels. So this commit creates an rcutree.rcu_fanout_exact= boot parameter that rcutorture can use, and eliminates the original CONFIG_RCU_FANOUT_EXACT Kconfig parameter. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2015-05-27 12:59:04 -07:00
Paul E. McKenney	0f41c0ddad	rcu: Provide diagnostic option to slow down grace-period scans Grace-period scans of the rcu_node combining tree normally proceed quite quickly, so that it is very difficult to reproduce races against them. This commit therefore allows grace-period pre-initialization and cleanup to be artificially slowed down, increasing race-reproduction probability. A pair of pairs of new Kconfig parameters are provided, RCU_TORTURE_TEST_SLOW_PREINIT to enable the slowing down of propagating CPU-hotplug changes up the combining tree along with RCU_TORTURE_TEST_SLOW_PREINIT_DELAY to specify the delay in jiffies, and RCU_TORTURE_TEST_SLOW_CLEANUP to enable the slowing down of the end-of-grace-period cleanup scan along with RCU_TORTURE_TEST_SLOW_CLEANUP_DELAY to specify the delay in jiffies. Boot-time parameters named rcutree.gp_preinit_delay and rcutree.gp_cleanup_delay allow these delays to be specified at boot time. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-05-27 12:59:02 -07:00
Paul E. McKenney	3eaaaf6cd6	rcu: Shut up spurious gcc uninitialized-variable warning Because gcc doesn't realize that rcu_num_lvls must be strictly greater than zero, some versions give a spurious warning about levelcnt[0] being uninitialized in rcu_init_one(). This commit updates the condition on the pre-existing panic() in order to educate gcc on this point. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-05-27 12:59:02 -07:00
Paul E. McKenney	eab128e830	rcu: Modulate grace-period slow init to normalize delay Currently, the larger the gp_init_delay boot parameter, the slower rcutorture will sequence through grace periods. This commit avoids this issue by decreasing the probability of slowing initialization of a given grace period as the degree of slowness increases. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-05-27 12:59:01 -07:00
Paul E. McKenney	a738eec6c6	rcu: Correctly initialize ->rcu_qs_ctr_snap at online time The rcu_data structure's ->rcu_qs_ctr_snap field is initialized at CPU-online time from the current CPU's element of the per-CPU rcu_qs_ctr variable. Unfortunately, this is at CPU_UP_PREPARE time, so has nothing to do with the CPU being onlined. This commit therefore initializes this variable from the incoming CPU's element of rcu_qs_ctr. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-05-27 12:58:38 -07:00
Paul E. McKenney	cce7f1fc01	rcu: Remove redundant offline check Because offline CPUs are propagated up the rcu_node tree's ->qsmaskinit bits just before each grace period starts, the ->qsmaskinit bit cannot be clear when the corresponding ->qsmask bit is set. Furthermore, this condition used to correspond to a CPU that was on its way offline, and making RCU's notion of an offline CPU more precise has eliminated this situation. This commit therefore removes the now-redundant offline check from force_qs_rnp(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-05-27 12:58:38 -07:00
Paul E. McKenney	c5b5539506	rcu: Remove dead code from force_qs_rnp() Because force_qs_rnp() is invoked only from the force-quiescent-state code which runs only in the context of the grace-period kthread, a grace period must always be in progress throughout force_qs_rnp()'s execution. This commit therefore removes the rcu_gp_in_progress() check and the associated dead code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-05-27 12:58:37 -07:00
Paul E. McKenney	ea46351cea	rcu: Eliminate HOTPLUG_CPU #ifdef in favor of IS_ENABLED() This commit removes a HOTPLUG_CPU #ifdef, replacing it with IS_ENABLED()-protected return statements. This relies on the optimizer to remove any resulting dead code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-05-27 12:58:37 -07:00
Nicholas Mc Guire	82072c4fcf	rcu: Change function declaration to bool rcu_cpu_has_callbacks() is declared int. The current declaration was introduced in commit `c0f4dfd4f9` (rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks). But it is actually returning bool and as the function description states " * Return true if the specified CPU has any callback....", this probably should be a bool as all (3) call-sites currently treat it as bool. Type-checking coccinelle spatches are being used to locate type mismatches between function signatures and return values in this case this produced: ./kernel/rcu/tree.c:3538 WARNING: return of wrong type int != bool, Patch was compile tested with x86_64_defconfig (implies CONFIG_TREE_RCU=y) Patch is against 4.1-rc3 (localversion-next is -next-20150511) and fixes Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-05-27 12:58:04 -07:00
Nicolas Iooss	c92fb05795	rcu: Make rcu_*_data variables static rcu_bh_data, rcu_sched_data and rcu_preempt_data are never used outside kernel/rcu/tree.c and thus can be made static. Doing so fixes a section mismatch warning reported by clang when building LLVMLinux with -Wsection, because these variables were declared in .data..percpu and defined in .data..percpu..shared_aligned since commit `11bbb235c2` ("rcu: Use DEFINE_PER_CPU_SHARED_ALIGNED for rcu_data"). Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-05-27 12:58:03 -07:00
Paul E. McKenney	30ff1533b8	rcu: Make synchronize_sched_expedited() call wait_rcu_gp() Currently, synchronize_sched_expedited() will call synchronize_sched() if there is danger of counter wrap. But if configuration says to always do expedited grace periods, synchronize_sched() will just call synchronize_sched_expedited() right back again. In theory, the old expedited operations will complete, the counters will get back in synch, and the recursion will end. But we could easily run out of stack long before that time. This commit therefore makes synchronize_sched_expedited() invoke the underlying wait_rcu_gp(call_rcu_sched) instead of synchronize_sched(), the same as all the other calls out from synchronize_sched_expedited(). This bug was introduced by commit `1924bcb025` (Avoid counter wrap in synchronize_sched_expedited()). Reported-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-05-27 12:58:03 -07:00
Paul E. McKenney	81e701e437	rcu: Add more debug info on "kthread starved" RCU CPU stall warnings This commit adds grace number and command-flags information to the "kthread starved" message that is sometimes printed out as part of RCU CPU stall warnings. This message is caused by the corresponding RCU grace-period kthread not having run for at least two seconds, and this added information can be helpful when debugging. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-05-27 12:58:02 -07:00
Paul E. McKenney	cd73ca21cd	rcu: Force wakeup of rcu_gp_kthread at grace-period end The rcu_gp_kthread_wake() refuses to do a wakeup unless at least one of the ->gp_flags bits are set, which normally will not be the case when the last quiescent state is reported. This results in up to a 3-jiffy delay given default Kconfig settings. This commit therefore has rcu_report_qs_rsp() set RCU_GP_FLAG_FQS before invoking rcu_gp_kthread_wake() in order to force a more immediate wakeup at grace-period end, thus reducing grace-period latencies. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-05-27 12:58:01 -07:00
Paul E. McKenney	2927a689e8	rcu: Create an immutable rcu_data_p pointer to default rcu_data structure This commit creates an immutable rcu_data_p pointer that references rcu_preempt_data for TREE_PREEMPT_RCU builds and that references rcu_sched_data for TREE_RCU builds. This rcu_data_p pointer will enable more code to move from #ifdef to IS_ENABLED(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-05-27 12:58:00 -07:00
Paul E. McKenney	b28a7c0166	rcu: Tell the compiler that rcu_state_p is immutable This commit adds a "const" tag to the declarations of rcu_state_p, which should allow the compiler to generate better code and also to catch erroneous assignments to this variable. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-05-27 12:57:59 -07:00
Paul E. McKenney	7d0ae8086b	rcu: Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE() This commit moves from the old ACCESS_ONCE() API to the new READ_ONCE() and WRITE_ONCE() APIs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Updated to include kernel/torture.c as suggested by Jason Low. ]	2015-05-27 12:56:15 -07:00
Steven Rostedt (Red Hat)	af658dca22	tracing: Rename ftrace_event.h to trace_events.h The term "ftrace" is really the infrastructure of the function hooks, and not the trace events. Rename ftrace_event.h to trace_events.h to represent the trace_event infrastructure and decouple the term ftrace from it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2015-05-13 14:05:12 -04:00
Paul E. McKenney	8d7dc9283f	rcu: Control grace-period delays directly from value In a misguided attempt to avoid an #ifdef, the use of the gp_init_delay module parameter was conditioned on the corresponding RCU_TORTURE_TEST_SLOW_INIT Kconfig variable, using IS_ENABLED() at the point of use in the code. This meant that the compiler always saw the delay, which meant that RCU_TORTURE_TEST_SLOW_INIT_DELAY had to be unconditionally defined. This in turn caused "make oldconfig" to ask pointless questions about the value of RCU_TORTURE_TEST_SLOW_INIT_DELAY in cases where it was not even used. This commit avoids these pointless questions by defining gp_init_delay under #ifdef. In one branch, gp_init_delay is initialized to RCU_TORTURE_TEST_SLOW_INIT_DELAY and is also a module parameter (thus allowing boot-time modification), and in the other branch gp_init_delay is a const variable initialized by default to zero. This approach also simplifies the code at the delay point by eliminating the IS_DEFINED(). Because gp_init_delay is constant zero in the no-delay case intended for production use, the "gp_init_delay > 0" check causes the delay to become dead code, as desired in this case. In addition, this commit replaces magic constant "10" with the preprocessor variable PER_RCU_NODE_PERIOD, which controls the number of grace periods that are allowed to elapse at full speed before a delay is inserted. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-04-14 19:33:59 -07:00
Paul E. McKenney	42528795ac	Merge branches 'doc.2015.02.26a', 'earlycb.2015.03.03a', 'fixes.2015.03.03a', 'gpexp.2015.02.26a', 'hotplug.2015.03.20a', 'sysidle.2015.02.26b' and 'tiny.2015.02.26a' into HEAD doc.2015.02.26a: Documentation changes earlycb.2015.03.03a: Permit early-boot RCU callbacks fixes.2015.03.03a: Miscellaneous fixes gpexp.2015.02.26a: In-kernel expediting of normal grace periods hotplug.2015.03.20a: CPU hotplug fixes sysidle.2015.02.26b: NO_HZ_FULL_SYSIDLE fixes tiny.2015.02.26a: TINY_RCU fixes	2015-03-20 08:31:01 -07:00
Paul E. McKenney	654e953340	rcu: Associate quiescent-state reports with grace period As noted in earlier commit logs, CPU hotplug operations running concurrently with grace-period initialization can result in a given leaf rcu_node structure having all CPUs offline and no blocked readers, but with this rcu_node structure nevertheless blocking the current grace period. Therefore, the quiescent-state forcing code now checks for this situation and repairs it. Unfortunately, this checking can result in false positives, for example, when the last task has just removed itself from this leaf rcu_node structure, but has not yet started clearing the ->qsmask bits further up the structure. This means that the grace-period kthread (which forces quiescent states) and some other task might be attempting to concurrently clear these ->qsmask bits. This is usually not a problem: One of these tasks will be the first to acquire the upper-level rcu_node structure's lock and with therefore clear the bit, and the other task, seeing the bit already cleared, will stop trying to clear bits. Sadly, this means that the following unusual sequence of events -can- result in a problem: 1. The grace-period kthread wins, and clears the ->qsmask bits. 2. This is the last thing blocking the current grace period, so that the grace-period kthread clears ->qsmask bits all the way to the root and finds that the root ->qsmask field is now zero. 3. Another grace period is required, so that the grace period kthread initializes it, including setting all the needed qsmask bits. 4. The leaf rcu_node structure (the one that started this whole mess) is blocking this new grace period, either because it has at least one online CPU or because there is at least one task that had blocked within an RCU read-side critical section while running on one of this leaf rcu_node structure's CPUs. (And yes, that CPU might well have gone offline before the grace period in step (3) above started, which can mean that there is a task on the leaf rcu_node structure's ->blkd_tasks list, but ->qsmask equal to zero.) 5. The other kthread didn't get around to trying to clear the upper level ->qsmask bits until all the above had happened. This means that it now sees bits set in the upper-level ->qsmask field, so it proceeds to clear them. Too bad that it is doing so on behalf of a quiescent state that does not apply to the current grace period! This sequence of events can result in the new grace period being too short. It can also result in the new grace period ending before the leaf rcu_node structure's ->qsmask bits have been cleared, which will result in splats during initialization of the next grace period. In addition, it can result in tasks blocking the new grace period still being queued at the start of the next grace period, which will result in other splats. Sasha's testing turned up another of these splats, as did rcutorture testing. (And yes, rcutorture is being adjusted to make these splats show up more quickly. Which probably is having the undesirable side effect of making other problems show up less quickly. Can't have everything!) Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> # 4.0.x Tested-by: Sasha Levin <sasha.levin@oracle.com>	2015-03-20 08:28:25 -07:00
Paul E. McKenney	a77da14ce9	rcu: Yet another fix for preemption and CPU hotplug As noted earlier, the following sequence of events can occur when running PREEMPT_RCU and HOTPLUG_CPU on a system with a multi-level rcu_node combining tree: 1. A group of tasks block on CPUs corresponding to a given leaf rcu_node structure while within RCU read-side critical sections. 2. All CPUs corrsponding to that rcu_node structure go offline. 3. The next grace period starts, but because there are still tasks blocked, the upper-level bits corresponding to this leaf rcu_node structure remain set. 4. All the tasks exit their RCU read-side critical sections and remove themselves from the leaf rcu_node structure's list, leaving it empty. 5. But because there now is code to check for this condition at force-quiescent-state time, the upper bits are cleared and the grace period completes. However, there is another complication that can occur following step 4 above: 4a. The grace period starts, and the leaf rcu_node structure's gp_tasks pointer is set to NULL because there are no tasks blocked on this structure. 4b. One of the CPUs corresponding to the leaf rcu_node structure comes back online. 4b. An endless stream of tasks are preempted within RCU read-side critical sections on this CPU, such that the ->blkd_tasks list is always non-empty. The grace period will never end. This commit therefore makes the force-quiescent-state processing check only for absence of tasks blocking the current grace period rather than absence of tasks altogether. This will cause a quiescent state to be reported if the current leaf rcu_node structure is not blocking the current grace period and its parent thinks that it is, regardless of how RCU managed to get itself into this state. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> # 4.0.x Tested-by: Sasha Levin <sasha.levin@oracle.com>	2015-03-20 08:27:33 -07:00
Paul E. McKenney	5c60d25fa1	rcu: Add diagnostics to grace-period cleanup At grace-period initialization time, RCU checks that all quiescent states were really reported for the previous grace period. Now that grace-period cleanup has been split out of grace-period initialization, this commit also performs those checks at grace-period cleanup time. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-03-12 15:19:38 -07:00
Paul E. McKenney	88428cc5c2	rcu: Handle outgoing CPUs on exit from idle loop This commit informs RCU of an outgoing CPU just before that CPU invokes arch_cpu_idle_dead() during its last pass through the idle loop (via a new CPU_DYING_IDLE notifier value). This change means that RCU need not deal with outgoing CPUs passing through the scheduler after informing RCU that they are no longer online. Note that removing the CPU from the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time, and orphaning callbacks is still done at CPU_DEAD time, the reason being that at CPU_DEAD time we have another CPU that can adopt them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-03-12 15:19:38 -07:00
Paul E. McKenney	c199068913	rcu: Eliminate ->onoff_mutex from rcu_node structure Because that RCU grace-period initialization need no longer exclude CPU-hotplug operations, this commit eliminates the ->onoff_mutex and its uses. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-03-12 15:19:37 -07:00
Paul E. McKenney	0aa04b055e	rcu: Process offlining and onlining only at grace-period start Races between CPU hotplug and grace periods can be difficult to resolve, so the ->onoff_mutex is used to exclude the two events. Unfortunately, this means that it is impossible for an outgoing CPU to perform the last bits of its offlining from its last pass through the idle loop, because sleeplocks cannot be acquired in that context. This commit avoids these problems by buffering online and offline events in a new ->qsmaskinitnext field in the leaf rcu_node structures. When a grace period starts, the events accumulated in this mask are applied to the ->qsmaskinit field, and, if needed, up the rcu_node tree. The special case of all CPUs corresponding to a given leaf rcu_node structure being offline while there are still elements in that structure's ->blkd_tasks list is handled using a new ->wait_blkd_tasks field. In this case, propagating the offline bits up the tree is deferred until the beginning of the grace period after all of the tasks have exited their RCU read-side critical sections and removed themselves from the list, at which point the ->wait_blkd_tasks flag is cleared. If one of that leaf rcu_node structure's CPUs comes back online before the list empties, then the ->wait_blkd_tasks flag is simply cleared. This of course means that RCU's notion of which CPUs are offline can be out of date. This is OK because RCU need only wait on CPUs that were online at the time that the grace period started. In addition, RCU's force-quiescent-state actions will handle the case where a CPU goes offline after the grace period starts. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-03-12 15:19:37 -07:00
Paul E. McKenney	cc99a310ca	rcu: Move rcu_report_unblock_qs_rnp() to common code The rcu_report_unblock_qs_rnp() function is invoked when the last task blocking the current grace period exits its outermost RCU read-side critical section. Previously, this was called only from rcu_read_unlock_special(), and was therefore defined only when CONFIG_RCU_PREEMPT=y. However, this function will be invoked even when CONFIG_RCU_PREEMPT=n once CPU-hotplug operations are processed only at the beginnings of RCU grace periods. The reason for this change is that the last task on a given leaf rcu_node structure's ->blkd_tasks list might well exit its RCU read-side critical section between the time that recent CPU-hotplug operations were applied and when the new grace period was initialized. This situation could result in RCU waiting forever on that leaf rcu_node structure, because if all that structure's CPUs were already offline, there would be no quiescent-state events to drive that structure's part of the grace period. This commit therefore moves rcu_report_unblock_qs_rnp() to common code that is built unconditionally so that the quiescent-state-forcing code can clean up after this situation, avoiding the grace-period stall. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-03-12 15:19:36 -07:00
Paul E. McKenney	999c286347	rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs Offline CPUs cannot safely invoke trace events, but such CPUs do execute within rcu_cpu_notify(). Therefore, this commit removes the trace events from rcu_cpu_notify(). These trace events are for utilization, against which rcu_cpu_notify() execution time should be negligible. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-03-11 13:22:39 -07:00
Paul E. McKenney	37745d2810	rcu: Provide diagnostic option to slow down grace-period initialization Grace-period initialization normally proceeds quite quickly, so that it is very difficult to reproduce races against grace-period initialization. This commit therefore allows grace-period initialization to be artificially slowed down, increasing race-reproduction probability. A pair of new Kconfig parameters are provided, CONFIG_RCU_TORTURE_TEST_SLOW_INIT to enable the slowdowns, and CONFIG_RCU_TORTURE_TEST_SLOW_INIT_DELAY to specify the number of jiffies of slowdown to apply. A boot-time parameter named rcutree.gp_init_delay allows boot-time delay to be specified. By default, no delay will be applied even if CONFIG_RCU_TORTURE_TEST_SLOW_INIT is set. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-03-11 13:22:38 -07:00
Paul E. McKenney	237a0f2193	rcu: Detect stalls caused by failure to propagate up rcu_node tree If all CPUs have passed through quiescent states, then stalls might be due to starvation of the grace-period kthread or to failure to propagate the quiescent states up the rcu_node combining tree. The current stall warning messages do not differentiate, so this commit adds a printout of the root rcu_node structure's ->qsmask field. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-03-11 13:22:38 -07:00
Paul E. McKenney	78043c467a	rcu: Put all orphan-callback-related code under same comment Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-03-11 13:22:37 -07:00
Paul E. McKenney	b33078b609	rcu: Consolidate offline-CPU callback initialization Currently, both rcu_cleanup_dead_cpu() and rcu_send_cbs_to_orphanage() initialize the outgoing CPU's callback list. However, only rcu_cleanup_dead_cpu() invokes rcu_send_cbs_to_orphanage(), and it does so unconditionally, which means that only one of these initializations is required. This commit therefore consolidates the callback-list initialization with the rest of the callback handling in rcu_send_cbs_to_orphanage(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-03-11 13:22:36 -07:00
Yao Dongdong	9910affa89	rcu: Remove redundant check of cpu_online() Because invoke_cpu_core() checks whether the current CPU is online, there is no need for __call_rcu_core() to redundantly check it. There should not be any performance degradation because the called function is visible to the compiler. This commit therefore removes the redundant check. Signed-off-by: Yao Dongdong <yaodongdong@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-03-03 11:17:34 -08:00
Paul E. McKenney	e7580f3388	rcu: Get rcu_sched_force_quiescent_state() where it belongs The very similar functions rcu_force_quiescent_state(), rcu_bh_force_quiescent_state(), and rcu_sched_force_quiescent_state() are supposed to be together, but have drifted apart. This commit restores rcu_sched_force_quiescent_state() to its rightful place. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-03-03 11:17:19 -08:00
Paul E. McKenney	6629240575	rcu: Use IS_ENABLED() to CONFIG_RCU_FANOUT_EXACT #ifdef This commit uses IS_ENABLED() to remove the #ifdef from the rcu_init_levelspread() functions. No effect on executable code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-03-03 11:14:08 -08:00
Paul E. McKenney	4762767810	rcu: Move early boot callback tests earlier Because callbacks can now be posted quite early in boot, move the early boot callback tests to precede RCU initialization. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-03-03 11:06:22 -08:00
Paul E. McKenney	34404ca8fb	rcu: Move early-boot callbacks to no-CBs lists for no-CBs CPUs When a CPU is first determined to be a no-CBs CPUs, this commit causes any early boot callbacks to be moved to the no-CBs callback list, allowing them to be invoked. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-03-03 11:06:02 -08:00
Paul E. McKenney	5871968d53	rcu: Tighten up affinity and check for sysidle If the RCU grace-period kthread invoking rcu_sysidle_check_cpu() happens to be running on the tick_do_timer_cpu initially, then rcu_bind_gp_kthread() won't bind it. This kthread might then migrate before invoking rcu_gp_fqs(), which will trigger the WARN_ON_ONCE() in rcu_sysidle_check_cpu(). This commit therefore makes rcu_bind_gp_kthread() do the binding even if the kthread is currently on the same CPU. Because this incurs added overhead, this commit also causes each RCU grace-period kthread to invoke rcu_bind_gp_kthread() once at boot rather than at the beginning of each grace period. And as long as rcu_bind_gp_kthread() is being modified, this commit eliminates its #ifdef. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-02-26 16:04:37 -08:00
Paul E. McKenney	675da67f24	rcu: Fixes to NO_HZ_FULL sysidle accounting On second and subsequent passes through quiescent-state forcing, the isidle variable was initialized to false, which would prevent full sysidle state from being reached if a grace period needed more than one round of quiescent-state forcing (which most should not). However, the check for offline CPUs in the quiescent-state forcing main loop had the wrong sense, which could prevent CPUs from ever entering full sysidle state. This commit fixes both of these bugs. Given that sysidle is not yet wired up, this has no effect in old kernels, but might have proven frustrating had anyone attempted to wire it up. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-02-26 12:11:03 -08:00
Paul E. McKenney	5afff48bdf	rcu: Update from rcu_expedited variable to rcu_gp_is_expedited() This commit updates open-coded tests of the rcu_expedited variable to instead use rcu_gp_is_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-02-26 12:03:01 -08:00
Paul E. McKenney	1925d1967c	rcu: Fix a couple of typos in rcu_all_qs() comment header Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-02-26 12:02:10 -08:00
Paul E. McKenney	39c8d313c3	rcu: Avoid clobbering early boot callbacks When a CPU comes online, it initializes its callback list. This is a bad thing if this is the first time that the CPU has come online and if that CPU has early boot callbacks. This commit therefore avoid initializing the callback list if there are callbacks present, in which case the initial call_rcu() did the initialization for us. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-02-26 12:01:30 -08:00
Paul E. McKenney	143da9c2fc	rcu: Prevent early-boot RCU callbacks from splatting Currently, a call_rcu() that precedes rcu_init() will splat due to the callback lists not having yet been initialized. This commit causes the first such callback to initialize the boot CPU's RCU callback list. Note that this commit does not change rcu_init()-time initialization, which means that the callback will be discarded at rcu_init() time. Fixing this is the job of later commits. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-02-26 12:01:28 -08:00
Paul E. McKenney	2723249a31	rcu: Wire ->rda pointers at compile time This commit wires up the rcu_state structures' ->rda pointers to the per-CPU rcu_data structures at compile time, thus ensuring that this linkage is present at early boot, in turn allowing posting of callbacks before rcu_init() is executed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-02-26 12:01:27 -08:00
Paul E. McKenney	d3f3f3f25b	rcu: Abstract default callback-list initialization from init_callback_list() In preparation for early-boot posting of callbacks, this commit abstracts initialization of the default (non-no-CB) callbacks list from the init_callback_list() function into a new init_default_callback_list() function. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-02-26 12:01:25 -08:00
Lai Jiangshan	3f47da0f32	rcu_tree: Avoid touching rnp->completed when a new GP is started In rcu_gp_init(), rnp->completed equals to rsp->completed in THEORY, we don't need to touch it normally. If something goes wrong, it will complain and fixup rnp->completed and avoid oops. This commit thus avoids the normal needless store to rnp->completed. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-02-25 17:03:05 -08:00
Paul E. McKenney	78e691f4ae	Merge branches 'doc.2015.01.07a', 'fixes.2015.01.15a', 'preempt.2015.01.06a', 'srcu.2015.01.06a', 'stall.2015.01.16a' and 'torture.2015.01.11a' into HEAD doc.2015.01.07a: Documentation updates. fixes.2015.01.15a: Miscellaneous fixes. preempt.2015.01.06a: Changes to handling of lists of preempted tasks. srcu.2015.01.06a: SRCU updates. stall.2015.01.16a: RCU CPU stall-warning updates and fixes. torture.2015.01.11a: RCU torture-test updates and fixes.	2015-01-15 23:34:34 -08:00
Paul E. McKenney	fb81a44b88	rcu: Add GP-kthread-starvation checks to CPU stall warnings This commit adds a message that is printed if the relevant grace-period kthread has not been able to run for the two seconds preceding the stall warning. (The two seconds is double the maximum interval between successive bouts of quiescent-state forcing.) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-15 23:33:15 -08:00
Paul E. McKenney	5cd37193ce	rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used in places where it would be useful for it to apply to the normal RCU flavors, rcu_preempt, rcu_sched, and rcu_bh. This is especially the case for workloads that aggressively overload the system, particularly those that generate large numbers of RCU updates on systems running NO_HZ_FULL CPUs. This commit therefore communicates quiescent states from cond_resched_rcu_qs() to the normal RCU flavors. Note that it is unfortunately necessary to leave the old ->passed_quiesce mechanism in place to allow quiescent states that apply to only one flavor to be recorded. (Yes, we could decrement ->rcu_qs_ctr_snap in that case, but that is not so good for debugging of RCU internals.) In addition, if one of the RCU flavor's grace period has stalled, this will invoke rcu_momentary_dyntick_idle(), resulting in a heavy-weight quiescent state visible from other CPUs. Reported-by: Sasha Levin <sasha.levin@oracle.com> Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Merge commit from Sasha Levin fixing a bug where __this_cpu() was used in preemptible code. ]	2015-01-15 23:33:14 -08:00
Paul E. McKenney	a94844b22a	rcu: Optionally run grace-period kthreads at real-time priority Recent testing has shown that under heavy load, running RCU's grace-period kthreads at real-time priority can improve performance (according to 0day test robot) and reduce the incidence of RCU CPU stall warnings. However, most systems do just fine with the default non-realtime priorities for these kthreads, and it does not make sense to expose the entire user base to any risk stemming from this change, given that this change is of use only to a few users running extremely heavy workloads. Therefore, this commit allows users to specify realtime priorities for the grace-period kthreads, but leaves them running SCHED_OTHER by default. The realtime priority may be specified at build time via the RCU_KTHREAD_PRIO Kconfig parameter, or at boot time via the rcutree.kthread_prio parameter. Either way, 0 says to continue the default SCHED_OTHER behavior and values from 1-99 specify that priority of SCHED_FIFO behavior. Note that a value of 0 is not permitted when the RCU_BOOST Kconfig parameter is specified. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-15 23:25:04 -08:00
Paul E. McKenney	917963d0b3	rcutorture: Check from beginning to end of grace period Currently, rcutorture's Reader Batch checks measure from the end of the previous grace period to the end of the current one. This commit tightens up these checks by measuring from the start and end of the same grace period. This involves adding rcu_batches_started() and friends corresponding to the existing rcu_batches_completed() and friends. We leave SRCU alone for the moment, as it does not yet have a way of tracking both ends of its grace periods. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-10 19:08:02 -08:00
Paul E. McKenney	9733e4f0a9	rcu: Make _batches_completed() functions return unsigned long Long ago, the various ->completed fields were of type long, but now are unsigned long due to signed-integer-overflow concerns. However, the various _batches_completed() functions remained of type long, even though their only purpose in life is to return the corresponding ->completed field. This patch cleans this up by changing these functions' return types to unsigned long. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-10 19:07:56 -08:00
Paul E. McKenney	e3663b1024	rcu: Handle gpnum/completed wrap while dyntick idle Subtle race conditions can result if a CPU stays in dyntick-idle mode long enough for the ->gpnum and ->completed fields to wrap. For example, consider the following sequence of events: o CPU 1 encounters a quiescent state while waiting for grace period 5 to complete, but then enters dyntick-idle mode. o While CPU 1 is in dyntick-idle mode, the grace-period counters wrap around so that the grace period number is now 4. o Just as CPU 1 exits dyntick-idle mode, grace period 4 completes and grace period 5 begins. o The quiescent state that CPU 1 passed through during the old grace period 5 looks like it applies to the new grace period 5. Therefore, the new grace period 5 completes without CPU 1 having passed through a quiescent state. This could clearly be a fatal surprise to any long-running RCU read-side critical section that happened to be running on CPU 1 at the time. At one time, this was not a problem, given that it takes significant time for the grace-period counters to overflow even on 32-bit systems. However, with the advent of NO_HZ_FULL and SMP embedded systems, arbitrarily long idle periods are now becoming quite feasible. It is therefore time to close this race. This commit therefore avoids this race condition by having the quiescent-state forcing code detect when a CPU is falling too far behind, and setting a new rcu_data field ->gpwrap when this happens. Whenever this new ->gpwrap field is set, the CPU's ->gpnum and ->completed fields are known to be untrustworthy, and can be ignored, along with any associated quiescent states. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:05:28 -08:00
Paul E. McKenney	6ccd2ecd42	rcu: Improve diagnostics for spurious RCU CPU stall warnings The current RCU CPU stall warning code will print "Stall ended before state dump start" any time that the stall-warning code is triggered on a CPU that has already reported a quiescent state for the current grace period and if all quiescent states have been reported for the current grace period. However, a true stall can result in these symptoms, for example, by preventing RCU's grace-period kthreads from ever running This commit therefore checks for this condition, reporting the end of the stall only if one of the grace-period counters has actually advanced. Otherwise, it reports the last time that the grace-period kthread made meaningful progress. (In normal situations, the grace-period kthread should make meaningful progress at least every jiffies_till_next_fqs jiffies.) Reported-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Miroslav Benes <mbenes@suse.cz>	2015-01-06 11:05:27 -08:00
Paul E. McKenney	fc908ed33e	rcu: Make RCU_CPU_STALL_INFO include number of fqs attempts One way that an RCU CPU stall warning can happen is if the grace-period kthread is not allowed to execute. One proxy for this kthread's forward progress is the number of force-quiescent-state (fqs) scans. This commit therefore adds the number of fqs scans to the RCU CPU stall warning printouts when CONFIG_RCU_CPU_STALL_INFO=y. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:05:25 -08:00
Paul E. McKenney	ab954c167e	rcu: Remove redundant callback-list initialization The RCU callback lists are initialized in both rcu_boot_init_percpu_data() and rcu_init_percpu_data(). The former is intended for initializing immutable data, so this commit removes the initialization from rcu_boot_init_percpu_data() and leaves it in rcu_init_percpu_data(). This change prepares for permitting callbacks to be queued very early in boot. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:54 -08:00
Paul E. McKenney	6cd534ef8b	rcu: Don't scan root rcu_node structure for stalled tasks Now that blocked tasks are no longer migrated to the root rcu_node structure, there is no need to scan the root rcu_node structure for blocked tasks stalling the current grace period. This commit therefore removes this scan. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:53 -08:00
Paul E. McKenney	3ba4d0e09b	rcu: Note quiescent state when CPU goes offline The rcu_cleanup_dead_cpu() function (called after a CPU has gone completely offline) has not reported a quiescent state because there was probably at least one synchronize_rcu() between the time the CPU went offline and the CPU_DEAD notifier, and this would have detected the CPU's offline state via quiescent-state forcing. However, the plan is for CPUs to take themselves offline, at which point it makes sense for them to report their own quiescent state. This commit makes this change in preparation for the new CPU-hotplug setup. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:51 -08:00
Paul E. McKenney	1be0085b51	rcu: Don't initiate RCU priority boosting on root rcu_node Because there is no longer any preempted tasks on the root rcu_node, and because there is no longer ever an rcub kthread for the root rcu_node, this commit drops the code in force_qs_rnp() that attempts to awaken the non-existent root rcub kthread. This is strictly a performance enhancement, removing a root rcu_node ->lock acquisition and release along with some tests in rcu_initiate_boost(), ending with the test that notes that there is no rcub kthread. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:48 -08:00
Paul E. McKenney	a8f4cbadfb	rcu: Shorten irq-disable region in rcu_cleanup_dead_cpu() Now that we are not migrating callbacks, there is no need to hold the ->orphan_lock across the the ->qsmaskinit bit-clearing process. This commit therefore releases ->orphan_lock immediately after adopting the orphaned RCU callbacks. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:45 -08:00
Paul E. McKenney	d19fb8d1f3	rcu: Don't migrate blocked tasks even if all corresponding CPUs offline When the last CPU associated with a given leaf rcu_node structure goes offline, something must be done about the tasks queued on that rcu_node structure. Each of these tasks has been preempted on one of the leaf rcu_node structure's CPUs while in an RCU read-side critical section that it have not yet exited. Handling these tasks is the job of rcu_preempt_offline_tasks(), which migrates them from the leaf rcu_node structure to the root rcu_node structure. Unfortunately, this migration has to be done one task at a time because each tasks allegiance must be shifted from the original leaf rcu_node to the root, so that future attempts to deal with these tasks will acquire the root rcu_node structure's ->lock rather than that of the leaf. Worse yet, this migration must be done with interrupts disabled, which is not so good for realtime response, especially given that there is no bound on the number of tasks on a given rcu_node structure's list. (OK, OK, there is a bound, it is just that it is unreasonably large, especially on 64-bit systems.) This was not considered a problem back when rcu_preempt_offline_tasks() was first written because realtime systems were assumed not to do CPU-hotplug operations while real-time applications were running. This assumption has proved of dubious validity given that people are starting to run multiple realtime applications on a single SMP system and that it is common practice to offline then online a CPU before starting its real-time application in order to clear extraneous processing off of that CPU. So we now need CPU hotplug operations to avoid undue latencies. This commit therefore avoids migrating these tasks, instead letting them be dequeued one by one from the original leaf rcu_node structure by rcu_read_unlock_special(). This means that the clearing of bits from the upper-level rcu_node structures must be deferred until the last such task has been dequeued, because otherwise subsequent grace periods won't wait on them. This commit has the beneficial side effect of simplifying the CPU-hotplug code for TREE_PREEMPT_RCU, especially in CONFIG_RCU_BOOST builds. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:44 -08:00
Paul E. McKenney	b6a932d1d9	rcu: Make rcu_read_unlock_special() propagate ->qsmaskinit bit clearing This commit causes rcu_read_unlock_special() to propagate ->qsmaskinit bit clearing up the rcu_node tree once a given rcu_node structure's blkd_tasks list becomes empty. This is the final commit in preparation for the rework of RCU priority boosting: It enables preempted tasks to remain queued on their rcu_node structure even after all of that rcu_node structure's CPUs have gone offline. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:43 -08:00
Paul E. McKenney	8af3a5e78c	rcu: Abstract rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu() This commit abstracts rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu() in preparation for the rework of RCU priority boosting. This new function will be invoked from rcu_read_unlock_special() in the reworked scheme, which is why rcu_cleanup_dead_rnp() assumes that the leaf rcu_node structure's ->qsmaskinit field has already been updated. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:02:41 -08:00
Paul E. McKenney	41050a0096	rcu: Fix rcu_barrier() race that could result in too-short wait The rcu_barrier() no-callbacks check for no-CBs CPUs has race conditions. It checks a given CPU's lists of callbacks, and if all three no-CBs lists are empty, ignores that CPU. However, these three lists could potentially be empty even when callbacks are present if the check executed just as the callbacks were being moved from one list to another. It turns out that recent versions of rcutorture can spot this race. This commit plugs this hole by consolidating the per-list counts of no-CBs callbacks into a single count, which is incremented before the corresponding callback is posted and after it is invoked. Then rcu_barrier() checks this single count to reliably determine whether the corresponding CPU has no-CBs callbacks. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:01:15 -08:00
Lai Jiangshan	5f6130fa52	tiny_rcu: Directly force QS when call_rcu_[bh\|sched]() on idle_task For RCU in UP, context-switch = QS = GP, thus we can force a context-switch when any call_rcu_[bh\|sched]() is happened on idle_task. After doing so, rcu_idle/irq_enter/exit() are useless, so we can simply make these functions empty. More important, this change does not change the functionality logically. Note: raise_softirq(RCU_SOFTIRQ)/rcu_sched_qs() in rcu_idle_enter() and outmost rcu_irq_exit() will have to wake up the ksoftirqd (due to in_interrupt() == 0). Before this patch After this patch: call_rcu_sched() in idle; call_rcu_sched() in idle set resched do other stuffs; do other stuffs outmost rcu_irq_exit() outmost rcu_irq_exit() (empty function) (or rcu_idle_enter()) (or rcu_idle_enter(), also empty function) start to resched. (see above) rcu_sched_qs() rcu_sched_qs() QS,and GP and advance cb QS,and GP and advance cb wake up the ksoftirqd wake up the ksoftirqd set resched resched to ksoftirqd (or other) resched to ksoftirqd (or other) These two code patches are almost the same. Size changed after patched: size kernel/rcu/tiny-old.o kernel/rcu/tiny-patched.o text data bss dec hex filename 3449 206 8 3663 e4f kernel/rcu/tiny-old.o 2406 144 8 2558 9fe kernel/rcu/tiny-patched.o Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2015-01-06 11:01:12 -08:00
Paul E. McKenney	924df8a011	rcu: Fix invoke_rcu_callbacks() comment Despite what the comment says, it is only softirqs that are disabled, not interrupts. This commit therefore fixes the comment. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-12-30 17:40:19 -08:00
Paul E. McKenney	734d168013	rcu: Make rcu_nmi_enter() handle nesting The x86 architecture has multiple types of NMI-like interrupts: real NMIs, machine checks, and, for some values of NMI-like, debugging and breakpoint interrupts. These interrupts can nest inside each other. Andy Lutomirski is adding RCU support to these interrupts, so rcu_nmi_enter() and rcu_nmi_exit() must now correctly handle nesting. This commit therefore introduces nesting, using a clever NMI-coordination algorithm suggested by Andy. The trick is to atomically increment ->dynticks (if needed) before manipulating ->dynticks_nmi_nesting on entry (and, accordingly, after on exit). In addition, ->dynticks_nmi_nesting is incremented by one if ->dynticks was incremented and by two otherwise. This means that when rcu_nmi_exit() sees ->dynticks_nmi_nesting equal to one, it knows that ->dynticks must be atomically incremented. This NMI-coordination algorithms has been validated by the following Promela model: ------------------------------------------------------------------------ /* * Promela model for Andy Lutomirski's suggested change to rcu_nmi_enter() * that allows nesting. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, you can access it online at * http://www.gnu.org/licenses/gpl-2.0.html. * * Copyright IBM Corporation, 2014 * * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> / byte dynticks_nmi_nesting = 0; byte dynticks = 0; / * Promela verision of rcu_nmi_enter(). / inline rcu_nmi_enter() { byte incby; byte tmp; incby = BUSY_INCBY; assert(dynticks_nmi_nesting >= 0); if :: (dynticks & 1) == 0 -> atomic { dynticks = dynticks + 1; } assert((dynticks & 1) == 1); incby = 1; :: else -> skip; fi; tmp = dynticks_nmi_nesting; tmp = tmp + incby; dynticks_nmi_nesting = tmp; assert(dynticks_nmi_nesting >= 1); } / * Promela verision of rcu_nmi_exit(). / inline rcu_nmi_exit() { byte tmp; assert(dynticks_nmi_nesting > 0); assert((dynticks & 1) != 0); if :: dynticks_nmi_nesting != 1 -> tmp = dynticks_nmi_nesting; tmp = tmp - BUSY_INCBY; dynticks_nmi_nesting = tmp; :: else -> dynticks_nmi_nesting = 0; atomic { dynticks = dynticks + 1; } assert((dynticks & 1) == 0); fi; } / * Base-level NMI runs non-atomically. Crudely emulates process-level * dynticks-idle entry/exit. / proctype base_NMI() { byte busy; busy = 0; do :: / Emulate base-level dynticks and not. / if :: 1 -> atomic { dynticks = dynticks + 1; } busy = 1; :: 1 -> skip; fi; / Verify that we only sometimes have base-level dynticks. / if :: busy == 0 -> skip; :: busy == 1 -> skip; fi; / Model RCU's NMI entry and exit actions. / rcu_nmi_enter(); assert((dynticks & 1) == 1); rcu_nmi_exit(); / Emulated re-entering base-level dynticks and not. / if :: !busy -> skip; :: busy -> atomic { dynticks = dynticks + 1; } busy = 0; fi; / We had better now be in dyntick-idle mode. / assert((dynticks & 1) == 0); od; } / * Nested NMI runs atomically to emulate interrupting base_level(). / proctype nested_NMI() { do :: / * Use an atomic section to model a nested NMI. This is * guaranteed to interleave into base_NMI() between a pair * of base_NMI() statements, just as a nested NMI would. / atomic { / Verify that we only sometimes are in dynticks. / if :: (dynticks & 1) == 0 -> skip; :: (dynticks & 1) == 1 -> skip; fi; / Model RCU's NMI entry and exit actions. */ rcu_nmi_enter(); assert((dynticks & 1) == 1); rcu_nmi_exit(); } od; } init { run base_NMI(); run nested_NMI(); } ------------------------------------------------------------------------ The following script can be used to run this model if placed in rcu_nmi.spin: ------------------------------------------------------------------------ if ! spin -a rcu_nmi.spin then echo Spin errors!!! exit 1 fi if ! cc -DSAFETY -o pan pan.c then echo Compilation errors!!! exit 1 fi ./pan -m100000 Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2014-12-30 17:40:16 -08:00
Paul E. McKenney	9ea6c58856	Merge branches 'torture.2014.11.03a', 'cpu.2014.11.03a', 'doc.2014.11.13a', 'fixes.2014.11.13a', 'signal.2014.10.29a' and 'rt.2014.10.29a' into HEAD cpu.2014.11.03a: Changes for per-CPU variables. doc.2014.11.13a: Documentation updates. fixes.2014.11.13a: Miscellaneous fixes. signal.2014.10.29a: Signal changes. rt.2014.10.29a: Real-time changes. torture.2014.11.03a: torture-test changes.	2014-11-13 10:39:04 -08:00
Pranith Kumar	aa23c6fbc5	rcutorture: Add early boot self tests Add early boot self tests for RCU under CONFIG_PROVE_RCU. Currently the only test is adding a dummy callback which increments a counter which we then later verify after calling rcu_barrier*(). Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-11-03 19:26:37 -08:00
Paul E. McKenney	8fa7845df5	rcu: Remove "cpu" argument to rcu_cleanup_after_idle() The "cpu" argument to rcu_cleanup_after_idle() is always the current CPU, so drop it. This moves the smp_processor_id() from the caller to rcu_cleanup_after_idle(), saving argument-passing overhead. Again, the anticipated cross-CPU uses of these functions has been replaced by NO_HZ_FULL. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2014-11-03 19:20:56 -08:00
Paul E. McKenney	198bbf8127	rcu: Remove "cpu" argument to rcu_prepare_for_idle() The "cpu" argument to rcu_prepare_for_idle() is always the current CPU, so drop it. This in turn allows two of the uses of "cpu" in this function to be replaced with a this_cpu_ptr() and the third by smp_processor_id(), replacing that of the call to rcu_prepare_for_idle(). Again, the anticipated cross-CPU uses of these functions has been replaced by NO_HZ_FULL. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2014-11-03 19:20:49 -08:00
Paul E. McKenney	aa6da5140b	rcu: Remove "cpu" argument to rcu_needs_cpu() The "cpu" argument to rcu_needs_cpu() is always the current CPU, so drop it. This in turn allows the "cpu" argument to rcu_cpu_has_callbacks() to be removed, which allows the uses of "cpu" in both functions to be replaced with a this_cpu_ptr(). Again, the anticipated cross-CPU uses of these functions has been replaced by NO_HZ_FULL. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2014-11-03 19:20:43 -08:00
Paul E. McKenney	38200cf247	rcu: Remove "cpu" argument to rcu_note_context_switch() The "cpu" argument to rcu_note_context_switch() is always the current CPU, so drop it. This in turn allows the "cpu" argument to rcu_preempt_note_context_switch() to be removed, which allows the sole use of "cpu" in both functions to be replaced with a this_cpu_ptr(). Again, the anticipated cross-CPU uses of these functions has been replaced by NO_HZ_FULL. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2014-11-03 19:20:34 -08:00
Paul E. McKenney	86aea0e6e7	rcu: Remove "cpu" argument to rcu_preempt_check_callbacks() Because rcu_preempt_check_callbacks()'s argument is guaranteed to always be the current CPU, drop the argument and replace per_cpu() with __this_cpu_read(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2014-11-03 19:20:26 -08:00
Paul E. McKenney	e3950ecd55	rcu: Remove "cpu" argument to rcu_pending() Because rcu_pending()'s argument is guaranteed to always be the current CPU, drop the argument and replace per_cpu_ptr() with this_cpu_ptr(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2014-11-03 19:20:18 -08:00
Paul E. McKenney	c3377c2da6	rcu: Remove "cpu" argument to rcu_check_callbacks() The "cpu" argument was kept around on the off-chance that RCU might offload scheduler-clock interrupts. However, this offload approach has been replaced by NO_HZ_FULL, which offloads -all- RCU processing from qualifying CPUs. It is therefore time to remove the "cpu" argument to rcu_check_callbacks(), which this commit does. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2014-11-03 19:20:11 -08:00
Paul E. McKenney	11bbb235c2	rcu: Use DEFINE_PER_CPU_SHARED_ALIGNED for rcu_data The rcu_data per-CPU variable has a number of fields that are atomically manipulated, potentially by any CPU. This situation can result in false sharing with per-CPU variables that have the misfortune of being allocated adjacent to rcu_data in memory. This commit therefore changes the DEFINE_PER_CPU() to DEFINE_PER_CPU_SHARED_ALIGNED() in order to avoid this false sharing. Reported-by: Christoph Lameter <cl@linux.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Christoph Lameter <cl@linux.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2014-11-03 19:20:03 -08:00
Christoph Lameter	28ced795cb	rcu: Remove rcu_dynticks * parameters when they are always this_cpu_ptr(&rcu_dynticks) For some functions in kernel/rcu/tree* the rdtp parameter is always this_cpu_ptr(rdtp). Remove the parameter if constant and calculate the pointer in function. This will have the advantage that it is obvious that the address are all per cpu offsets and thus it will enable the use of this_cpu_ops in the future. Signed-off-by: Christoph Lameter <cl@linux.com> [ paulmck: Forward-ported to rcu/dev, whitespace adjustment. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2014-11-03 19:19:26 -08:00
Paul E. McKenney	776d680711	rcu: Kick rcuo kthreads after their CPU goes offline If a no-CBs CPU were to post an RCU callback with interrupts disabled after it entered the idle loop for the last time, there might be no deferred wakeup for the corresponding rcuo kthreads. This commit therefore adds a set of calls to do_nocb_deferred_wakeup() after the CPU has gone completely offline. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-10-29 10:20:07 -07:00
Paul E. McKenney	e0775cefb5	rcu: Avoid IPIing idle CPUs from synchronize_sched_expedited() Currently, synchronize_sched_expedited() sends IPIs to all online CPUs, even those that are idle or executing in nohz_full= userspace. Because idle CPUs and nohz_full= userspace CPUs are in extended quiescent states, there is no need to IPI them in the first place. This commit therefore avoids IPIing CPUs that are already in extended quiescent states. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-10-28 13:49:30 -07:00
Paul E. McKenney	61cfd0970e	rcu: Move RCU_BOOST variable declarations, eliminating #ifdef There are some RCU_BOOST-specific per-CPU variable declarations that are needlessly defined under #ifdef in kernel/rcu/tree.c. This commit therefore moves these declarations into a pre-existing #ifdef in kernel/rcu/tree_plugin.h. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-10-28 13:49:28 -07:00
Paul E. McKenney	d7e2993396	rcu: Make rcu_barrier() understand about missing rcuo kthreads Commit `35ce7f29a4` (rcu: Create rcuo kthreads only for onlined CPUs) avoids creating rcuo kthreads for CPUs that never come online. This fixes a bug in many instances of firmware: Instead of lying about their age, these systems instead lie about the number of CPUs that they have. Before commit `35ce7f29a4`, this could result in huge numbers of useless rcuo kthreads being created. It appears that experience indicates that I should have told the people suffering from this problem to fix their broken firmware, but I instead produced what turned out to be a partial fix. The missing piece supplied by this commit makes sure that rcu_barrier() knows not to post callbacks for no-CBs CPUs that have not yet come online, because otherwise rcu_barrier() will hang on systems having firmware that lies about the number of CPUs. It is tempting to simply have rcu_barrier() refuse to post a callback on any no-CBs CPU that does not have an rcuo kthread. This unfortunately does not work because rcu_barrier() is required to wait for all pending callbacks. It is therefore required to wait even for those callbacks that cannot possibly be invoked. Even if doing so hangs the system. Given that posting a callback to a no-CBs CPU that does not yet have an rcuo kthread can hang rcu_barrier(), It is tempting to report an error in this case. Unfortunately, this will result in false positives at boot time, when it is perfectly legal to post callbacks to the boot CPU before the scheduler has started, in other words, before it is legal to invoke rcu_barrier(). So this commit instead has rcu_barrier() avoid posting callbacks to CPUs having neither rcuo kthread nor pending callbacks, and has it complain bitterly if it finds CPUs having no rcuo kthread but some pending callbacks. And when rcu_barrier() does find CPUs having no rcuo kthread but pending callbacks, as noted earlier, it has no choice but to hang indefinitely. Reported-by: Yanko Kaneti <yaneti@declera.com> Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reported-by: Meelis Roos <mroos@linux.ee> Reported-by: Eric B Munson <emunson@akamai.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Eric B Munson <emunson@akamai.com> Tested-by: Jay Vosburgh <jay.vosburgh@canonical.com> Tested-by: Yanko Kaneti <yaneti@declera.com> Tested-by: Kevin Fenzi <kevin@scrye.com> Tested-by: Meelis Roos <mroos@linux.ee>	2014-10-28 13:24:13 -07:00
Paul E. McKenney	dd56af42bd	rcu: Eliminate deadlock between CPU hotplug and expedited grace periods Currently, the expedited grace-period primitives do get_online_cpus(). This greatly simplifies their implementation, but means that calls to them holding locks that are acquired by CPU-hotplug notifiers (to say nothing of calls to these primitives from CPU-hotplug notifiers) can deadlock. But this is starting to become inconvenient, as can be seen here: https://lkml.org/lkml/2014/8/5/754. The problem in this case is that some developers need to acquire a mutex from a CPU-hotplug notifier, but also need to hold it across a synchronize_rcu_expedited(). As noted above, this currently results in deadlock. This commit avoids the deadlock and retains the simplicity by creating a try_get_online_cpus(), which returns false if the get_online_cpus() reference count could not immediately be incremented. If a call to try_get_online_cpus() returns true, the expedited primitives operate as before. If a call returns false, the expedited primitives fall back to normal grace-period operations. This falling back of course results in increased grace-period latency, but only during times when CPU hotplug operations are actually in flight. The effect should therefore be negligible during normal operation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Tested-by: Lan Tianyu <tianyu.lan@intel.com>	2014-09-18 16:22:27 -07:00
Paul E. McKenney	96b4672703	Merge branch 'rcu-tasks.2014.09.10a' into HEAD rcu-tasks.2014.09.10a: Add RCU-tasks flavor of RCU.	2014-09-16 10:10:44 -07:00
Paul E. McKenney	e98d06dd6c	Merge branches 'doc.2014.09.07a', 'fixes.2014.09.10a', 'nocb-nohz.2014.09.16b' and 'torture.2014.09.07a' into HEAD doc.2014.09.07a: Documentation updates. fixes.2014.09.10a: Miscellaneous fixes. nocb-nohz.2014.09.16b: No-CBs CPUs and NO_HZ_FULL updates. torture.2014.09.07a: Torture-test updates.	2014-09-16 10:08:34 -07:00
Paul E. McKenney	35ce7f29a4	rcu: Create rcuo kthreads only for onlined CPUs RCU currently uses for_each_possible_cpu() to spawn rcuo kthreads, which can result in more rcuo kthreads than one would expect, for example, derRichard reported 64 CPUs worth of rcuo kthreads on an 8-CPU image. This commit therefore creates rcuo kthreads only for those CPUs that actually come online. This was reported by derRichard on the OFTC IRC network. Reported-by: Richard Weinberger <richard@nod.at> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2014-09-16 10:08:02 -07:00
Paul E. McKenney	9386c0b75d	rcu: Rationalize kthread spawning Currently, RCU spawns kthreads from several different early_initcall() functions. Although this has served RCU well for quite some time, as more kthreads are added a more deterministic approach is required. This commit therefore causes all of RCU's early-boot kthreads to be spawned from a single early_initcall() function. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2014-09-16 10:08:01 -07:00
Paul E. McKenney	284a8c93af	rcu: Per-CPU operation cleanups to rcu_*_qs() functions The rcu_bh_qs(), rcu_preempt_qs(), and rcu_sched_qs() functions use old-style per-CPU variable access and write to ->passed_quiesce even if it is already set. This commit therefore updates to use the new-style per-CPU variable access functions and avoids the spurious writes. This commit also eliminates the "cpu" argument to these functions because they are always invoked on the indicated CPU. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-09-07 16:27:35 -07:00
Paul E. McKenney	176f8f7a52	rcu: Make TASKS_RCU handle nohz_full= CPUs Currently TASKS_RCU would ignore a CPU running a task in nohz_full= usermode execution. There would be neither a context switch nor a scheduling-clock interrupt to tell TASKS_RCU that the task in question had passed through a quiescent state. The grace period would therefore extend indefinitely. This commit therefore makes RCU's dyntick-idle subsystem record the task_struct structure of the task that is running in dyntick-idle mode on each CPU. The TASKS_RCU grace period can then access this information and record a quiescent state on behalf of any CPU running in dyntick-idle usermode. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-09-07 16:27:30 -07:00
Paul E. McKenney	bde6c3aa99	rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops RCU-tasks requires the occasional voluntary context switch from CPU-bound in-kernel tasks. In some cases, this requires instrumenting cond_resched(). However, there is some reluctance to countenance unconditionally instrumenting cond_resched() (see http://lwn.net/Articles/603252/), so this commit creates a separate cond_resched_rcu_qs() that may be used in place of cond_resched() in locations prone to long-duration in-kernel looping. This commit currently instruments only RCU-tasks. Future possibilities include also instrumenting RCU, RCU-bh, and RCU-sched in order to reduce IPI usage. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-09-07 16:27:20 -07:00
Paul E. McKenney	8315f42295	rcu: Add call_rcu_tasks() This commit adds a new RCU-tasks flavor of RCU, which provides call_rcu_tasks(). This RCU flavor's quiescent states are voluntary context switch (not preemption!) and userspace execution (not the idle loop -- use some sort of schedule_on_each_cpu() if you need to handle the idle tasks. Note that unlike other RCU flavors, these quiescent states occur in tasks, not necessarily CPUs. Includes fixes from Steven Rostedt. This RCU flavor is assumed to have very infrequent latency-tolerant updaters. This assumption permits significant simplifications, including a single global callback list protected by a single global lock, along with a single task-private linked list containing all tasks that have not yet passed through a quiescent state. If experience shows this assumption to be incorrect, the required additional complexity will be added. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-09-07 16:27:19 -07:00
Paul E. McKenney	73a860cd58	rcu: Replace flush_signals() with WARN_ON(signal_pending()) Currently, when RCU awakens from a wait_event_interruptible() that might have awakened prematurely, it does a flush_signals(). This is done on the off-chance that someone figured out how to deliver a signal to a kthread, which is supposed to be impossible. Given that this is supposed to be impossible, this commit changes the flush_signals() calls into WARN_ON(signal_pending()). Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-09-07 16:18:20 -07:00
Pranith Kumar	2aa792e6fa	rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads The rcu_gp_kthread_wake() function checks for three conditions before waking up grace period kthreads: * Is the thread we are trying to wake up the current thread? * Are the gp_flags zero? (all threads wait on non-zero gp_flags condition) * Is there no thread created for this flavour, hence nothing to wake up? If any one of these condition is true, we do not call wake_up(). It was found that there are quite a few avoidable wake ups both during idle time and under stress induced by rcutorture. Idle: Total:66000, unnecessary:66000, case1:61827, case2:66000, case3:0 Total:68000, unnecessary:68000, case1:63696, case2:68000, case3:0 rcutorture: Total:254000, unnecessary:254000, case1:199913, case2:254000, case3:0 Total:256000, unnecessary:256000, case1:201784, case2:256000, case3:0 Here case{1-3} are the cases listed above. We can avoid these wake ups by using rcu_gp_kthread_wake() to conditionally wake up the grace period kthreads. There is a comment about an implied barrier supplied by the wake_up() logic. This barrier is necessary for the awakened thread to see the updated ->gp_flags. This flag is always being updated with the root node lock held. Also, the awakened thread tries to acquire the root node lock before reading ->gp_flags because of which there is proper ordering. Hence this commit tries to avoid calling wake_up() whenever we can by using rcu_gp_kthread_wake() function. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-09-07 16:18:19 -07:00
Pranith Kumar	66d701ea7e	rcu: Remove stale comment in tree.c This commit removes a stale comment in rcu/tree.c which was left out when some code was moved around previously in commit `2036d94a7b` ("rcu: Rework detection of use of RCU by offline CPUs") For reference, the following updated comment exists a few lines below this which means the same: /* Remove the outgoing CPU from the masks in the rcu_node hierarchy. */ Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-09-07 16:18:16 -07:00
Ard Biesheuvel	a8a29b3b7b	rcu: Define tracepoint strings only if CONFIG_TRACING is set Commit `f7f7bac9cb` ("rcu: Have the RCU tracepoints use the tracepoint_string infrastructure") unconditionally populates the __tracepoint_str input section, but this section is not assigned an output section if CONFIG_TRACING is not set. This results in the __tracepoint_str turning up in unexpected places, i.e., after _edata. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-09-07 16:18:14 -07:00
Pranith Kumar	e02b2edfa1	rcu: Use true/false instead of 1/0 for a bool type This commit uses true/false instead of 1/0 for bool types in rcu_gp_fqs() and force_qs_rnp(). Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-09-07 16:18:12 -07:00
Pranith Kumar	f534ed1fd7	rcu: Use bool type for return value in rcu_is_watching() Use a bool type for return in rcu_is_watching(). Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-09-07 16:18:09 -07:00
Pranith Kumar	4de376a1b1	rcu: Remove remaining read-modify-write ACCESS_ONCE() calls Change the remaining uses of ACCESS_ONCE() so that each ACCESS_ONCE() either does a load or a store, but not both. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-09-07 16:18:07 -07:00
Paul E. McKenney	11992c703a	rcu: Remove CONFIG_PROVE_RCU_DELAY The CONFIG_PROVE_RCU_DELAY Kconfig parameter doesn't appear to be very effective at finding race conditions, so this commit removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> [ paulmck: Remove definition and uses as noted by Paul Bolle. ]	2014-07-09 09:15:31 -07:00
Shan Wei	d860d40327	rcu: Use __this_cpu_read() instead of per_cpu_ptr() The __this_cpu_read() function produces better code than does per_cpu_ptr() on both ARM and x86. For example, gcc (Ubuntu/Linaro 4.7.3-12ubuntu1) 4.7.3 produces the following: ARMv7 per_cpu_ptr(): force_quiescent_state: mov r3, sp @, bic r1, r3, #8128 @ tmp171,, ldr r2, .L98 @ tmp169, bic r1, r1, #63 @ tmp170, tmp171, ldr r3, [r0, #220] @ __ptr, rsp_6(D)->rda ldr r1, [r1, #20] @ D.35903_68->cpu, D.35903_68->cpu mov r6, r0 @ rsp, rsp ldr r2, [r2, r1, asl #2] @ tmp173, __per_cpu_offset add r3, r3, r2 @ tmp175, __ptr, tmp173 ldr r5, [r3, #12] @ rnp_old, D.29162_13->mynode ARMv7 __this_cpu_read(): force_quiescent_state: ldr r3, [r0, #220] @ rsp_7(D)->rda, rsp_7(D)->rda mov r6, r0 @ rsp, rsp add r3, r3, #12 @ __ptr, rsp_7(D)->rda, ldr r5, [r2, r3] @ rnp_old, *D.29176_13 Using gcc 4.8.2: x86_64 per_cpu_ptr(): movl %gs:cpu_number,%edx # cpu_number, pscr_ret__ movslq %edx, %rdx # pscr_ret__, pscr_ret__ movq __per_cpu_offset(,%rdx,8), %rdx # __per_cpu_offset, tmp93 movq %rdi, %r13 # rsp, rsp movq 1000(%rdi), %rax # rsp_9(D)->rda, __ptr movq 24(%rdx,%rax), %r12 # _15->mynode, rnp_old x86_64 __this_cpu_read(): movq %rdi, %r13 # rsp, rsp movq 1000(%rdi), %rax # rsp_9(D)->rda, rsp_9(D)->rda movq %gs:24(%rax),%r12 # _10->mynode, rnp_old Because this change produces significant benefits for these two very diverse architectures, this commit makes this change. Signed-off-by: Shan Wei <davidshan@tencent.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2014-07-09 09:15:21 -07:00
Paul E. McKenney	bc1dce514e	rcu: Don't use NMIs to dump other CPUs' stacks Although NMI-based stack dumps are in principle more accurate, they are also more likely to trigger deadlocks. This commit therefore replaces all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so that the CPU detecting an RCU CPU stall does the stack dumping. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2014-07-09 09:15:04 -07:00
Pranith Kumar	48bd8e9b82	rcu: Check both root and current rcu_node when setting up future grace period The rcu_start_future_gp() function checks the current rcu_node's ->gpnum and ->completed twice, once without ACCESS_ONCE() and once with it. Which is pointless because we hold that rcu_node's ->lock at that point. The intent was to check the current rcu_node structure and the root rcu_node structure, the latter locklessly with ACCESS_ONCE(). This commit therefore makes that change. The reason that it is safe to locklessly check the root rcu_nodes's ->gpnum and ->completed fields is that we hold the current rcu_node's ->lock, which constrains the root rcu_node's ability to change its ->gpnum and ->completed fields. Of course, if there is a single rcu_node structure, then rnp_root==rnp, and holding the lock prevents all changes. If there is more than one rcu_node structure, then the code updates the fields in the following order: 1. Increment rnp_root->gpnum to start new grace period. 2. Increment rnp->gpnum to initialize the current rcu_node, continuing initialization for the new grace period. 3. Increment rnp_root->completed to end the current grace period. 4. Increment rnp->completed to continue cleaning up after the old grace period. So there are four possible combinations of relative values of these four fields: N N N N: RCU idle, new grace period must be initiated. Although rnp_root->gpnum might be incremented immediately after we check, that will just result in unnecessary work. The grace period already started, and we try to start it. N+1 N N N: RCU grace period just started. No further change is possible because we hold rnp->lock, so the checks of rnp_root->gpnum and rnp_root->completed are stable. We know that our request for a future grace period will be seen during grace-period cleanup. N+1 N N+1 N: RCU grace period is ongoing. Because rnp->gpnum is different than rnp->completed, we won't even look at rnp_root->gpnum and rnp_root->completed, so the possible concurrent change to rnp_root->completed does not matter. We know that our request for a future grace period will be seen during grace-period cleanup, which cannot pass this rcu_node because we hold its ->lock. N+1 N+1 N+1 N: RCU grace period has ended, but not yet been cleaned up. Because rnp->gpnum is different than rnp->completed, we won't look at rnp_root->gpnum and rnp_root->completed, so the possible concurrent change to rnp_root->completed does not matter. We know that our request for a future grace period will be seen during grace-period cleanup, which cannot pass this rcu_node because we hold its ->lock. Therefore, despite initial appearances, the lockless check is safe. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> [ paulmck: Update comment to say why the lockless check is safe. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-07-09 09:15:01 -07:00
Paul E. McKenney	1146edcbef	rcu: Loosen __call_rcu()'s rcu_head alignment constraint The m68k architecture aligns only to 16-bit boundaries, which can cause the align-to-32-bits check in __call_rcu() to trigger. Because there is currently no known potential need for more than one low-order bit, this commit loosens the check to 16-bit boundaries. Reported-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2014-07-09 09:14:50 -07:00
Paul E. McKenney	a792563bd4	rcu: Eliminate read-modify-write ACCESS_ONCE() calls RCU contains code of the following forms: ACCESS_ONCE(x)++; ACCESS_ONCE(x) += y; ACCESS_ONCE(x) -= y; Now these constructs do operate correctly, but they really result in a pair of volatile accesses, one to do the load and another to do the store. This can be confusing, as the casual reader might well assume that (for example) gcc might generate a memory-to-memory add instruction for each of these three cases. In fact, gcc will do no such thing. Also, there is a good chance that the kernel will move to separate load and store variants of ACCESS_ONCE(), and constructs like the above could easily confuse both people and scripts attempting to make that sort of change. Finally, most of RCU's read-modify-write uses of ACCESS_ONCE() really only need the store to be volatile, so that the read-modify-write form might be misleading. This commit therefore changes the above forms in RCU so that each instance of ACCESS_ONCE() either does a load or a store, but not both. In a few cases, ACCESS_ONCE() was not critical, for example, for maintaining statisitics. In these cases, ACCESS_ONCE() has been dispensed with entirely. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-07-09 09:14:49 -07:00
Fabian Frederick	b4426b49c6	rcu: Make rcu node arrays static const char * const Those two arrays are being passed to lockdep_init_map(), which expects const char *, and are stored in lockdep_map the same way. Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-07-09 09:14:34 -07:00
Paul E. McKenney	4a81e8328d	rcu: Reduce overhead of cond_resched() checks for RCU Commit `ac1bea8578` (Make cond_resched() report RCU quiescent states) fixed a problem where a CPU looping in the kernel with but one runnable task would give RCU CPU stall warnings, even if the in-kernel loop contained cond_resched() calls. Unfortunately, in so doing, it introduced performance regressions in Anton Blanchard's will-it-scale "open1" test. The problem appears to be not so much the increased cond_resched() path length as an increase in the rate at which grace periods complete, which increased per-update grace-period overhead. This commit takes a different approach to fixing this bug, mainly by moving the RCU-visible quiescent state from cond_resched() to rcu_note_context_switch(), and by further reducing the check to a simple non-zero test of a single per-CPU variable. However, this approach requires that the force-quiescent-state processing send resched IPIs to the offending CPUs. These will be sent only once the grace period has reached an age specified by the boot/sysfs parameter rcutree.jiffies_till_sched_qs, or once the grace period reaches an age halfway to the point at which RCU CPU stall warnings will be emitted, whichever comes first. Reported-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Lameter <cl@gentwo.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> [ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the ktest build robot. Also fixed smp_mb() comment as noted by Oleg Nesterov. ] Merge with e552592e (Reduce overhead of cond_resched() checks for RCU) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-06-23 11:19:32 -07:00
Linus Torvalds	776edb5931	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull core locking updates from Ingo Molnar: "The main changes in this cycle were: - reduced/streamlined smp_mb__() interface that allows more usecases and makes the existing ones less buggy, especially in rarer architectures - add rwsem implementation comments - bump up lockdep limits" 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) rwsem: Add comments to explain the meaning of the rwsem's count field lockdep: Increase static allocations arch: Mass conversion of smp_mb__() arch,doc: Convert smp_mb__() arch,xtensa: Convert smp_mb__() arch,x86: Convert smp_mb__() arch,tile: Convert smp_mb__() arch,sparc: Convert smp_mb__() arch,sh: Convert smp_mb__() arch,score: Convert smp_mb__() arch,s390: Convert smp_mb__() arch,powerpc: Convert smp_mb__() arch,parisc: Convert smp_mb__() arch,openrisc: Convert smp_mb__() arch,mn10300: Convert smp_mb__() arch,mips: Convert smp_mb__() arch,metag: Convert smp_mb__() arch,m68k: Convert smp_mb__() arch,m32r: Convert smp_mb__() arch,ia64: Convert smp_mb__() ...	2014-06-03 12:57:53 -07:00
Uma Sharma	e534165bbf	rcu: Variable name changed in tree_plugin.h and used in tree.c The variable and struct both having the name "rcu_state" confuses sparse in some situations, so this commit changes the variable to "rcu_state_p" in order to avoid this confusion. This also makes things easier for human readers. Signed-off-by: Uma Sharma <uma.sharma523@gmail.com> [ paulmck: Changed the declaration and several additional uses. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-05-14 11:41:04 -07:00
Paul E. McKenney	f5d2a0450d	Merge branches 'doc.2014.04.29a', 'fixes.2014.04.29a' and 'torture.2014.05.14a' into HEAD doc.2014.04.29a: Documentation updates. fixes.2014.04.29a: Miscellaneous fixes. torture.2014.05.14a: RCU/Lock torture tests.	2014-05-14 10:57:31 -07:00
Paul E. McKenney	afea227fd4	rcutorture: Export RCU grace-period kthread wait state to rcutorture This commit allows rcutorture to print additional state for the RCU grace-period kthreads in cases where RCU seems reluctant to start a new grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2014-05-14 09:46:09 -07:00
Paul E. McKenney	ad0dc7f94d	rcutorture: Add forward-progress checking for writer The rcutorture output currently does not distinguish between stalls in the RCU implementation and stalls in the rcu_torture_writer() kthreads. This commit therefore adds some diagnostics to help distinguish between these two conditions, at least for the non-SRCU implementations. (SRCU does not provide evidence of update-side forward progress by design.) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-05-13 11:18:18 -07:00
Christoph Lameter	fa07a58f71	rcu: Replace __this_cpu_ptr() uses with raw_cpu_ptr() __this_cpu_ptr is being phased out. One special case is increment_cpu_stall_ticks(). A per cpu variable is incremented so use raw_cpu_inc(). Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2014-04-29 08:45:35 -07:00
Pranith Kumar	8c96ae1dfa	rcu: Remove duplicate resched_cpu() declaration Signed-off-by: Pranith Kumar <pranith@gatech.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2014-04-29 08:45:29 -07:00
Andreea-Cristina Bernat	a381d757d9	rcu: Merge rcu_sched_force_quiescent_state() with rcu_force_quiescent_state() This patch merges the function rcu_force_quiescent_state() with rcu_sched_force_quiescent_state(), using the rcu_state pointer. Firstly, the rcu_sched_force_quiescent_state() function is deleted from the file kernel/rcu/tree.c. Also, the rcu_force_quiescent_state() function that was calling force_quiescent_state with the argument rcu_preempt_state pointer was deleted as well. The new function that combines the old ones uses the rcu_state pointer and is located after rcu_batches_completed_bh() in kernel/rcu/tree.c. Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2014-04-29 08:45:07 -07:00
Andreea-Cristina Bernat	495aa969db	rcu: Consolidate kfree_call_rcu() to use rcu_state pointer kfree_call_rcu is defined two times. When defined under CONFIG_TREE_PREEMPT_RCU, it uses rcu_preempt_state. Otherwise, it uses rcu_sched_state. This patch uses the rcu_state_pointer to combine the two definitions into one. The resulting function is placed after the closing of the preprocessor conditional CONFIG_TREE_PREEMPT_RCU. Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2014-04-29 08:45:01 -07:00
Himangi Saraogi	595f3900f6	rcu: Replace NR_CPUS with nr_cpu_ids This patch replaces NR_CPUS with nr_cpu_ids as NR_CPUS should consider cpumask_var_t. Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2014-04-29 08:44:55 -07:00
Andreea-Cristina Bernat	7941dbdebe	rcu: Add event tracing to dyntick_save_progress_counter(). This patch adds event tracing to dyntick_save_progress_counter() in the case where it returns 1. I used the tracepoint string "dti" because this function returns 1 in case the CPU is in dynticks idle mode. Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2014-04-29 08:44:49 -07:00
Paul E. McKenney	48a7639ce8	rcu: Make callers awaken grace-period kthread The rcu_start_gp_advanced() function currently uses irq_work_queue() to defer wakeups of the RCU grace-period kthread. This deferring is necessary to avoid RCU-scheduler deadlocks involving the rcu_node structure's lock, meaning that RCU cannot call any of the scheduler's wake-up functions while holding one of these locks. Unfortunately, the second and subsequent calls to irq_work_queue() are ignored, and the first call will be ignored (aside from queuing the work item) if the scheduler-clock tick is turned off. This is OK for many uses, especially those where irq_work_queue() is called from an interrupt or softirq handler, because in those cases the scheduler-clock-tick state will be re-evaluated, which will turn the scheduler-clock tick back on. On the next tick, any deferred work will then be processed. However, this strategy does not always work for RCU, which can be invoked at process level from idle CPUs. In this case, the tick might never be turned back on, indefinitely defering a grace-period start request. Note that the RCU CPU stall detector cannot see this condition, because there is no RCU grace period in progress. Therefore, we can (and do!) see long tens-of-seconds stalls in grace-period handling. In theory, we could see a full grace-period hang, but rcutorture testing to date has seen only the tens-of-seconds stalls. Event tracing demonstrates that irq_work_queue() is being called repeatedly to no effect during these stalls: The "newreq" event appears repeatedly from a task that is not one of the grace-period kthreads. In theory, irq_work_queue() might be fixed to avoid this sort of issue, but RCU's requirements are unusual and it is quite straightforward to pass wake-up responsibility up through RCU's call chain, so that the wakeup happens when the offending locks are released. This commit therefore makes this change. The rcu_start_gp_advanced(), rcu_start_future_gp(), rcu_accelerate_cbs(), rcu_advance_cbs(), __note_gp_changes(), and rcu_start_gp() functions now return a boolean which indicates when a wake-up is needed. A new rcu_gp_kthread_wake() does the wakeup when it is necessary and safe to do so: No self-wakes, no wake-ups if the ->gp_flags field indicates there is no need (as in someone else did the wake-up before we got around to it), and no wake-ups before the grace-period kthread has been created. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2014-04-29 08:44:07 -07:00
Iulia Manda	4fc5b75537	rcu: Protect uses of jiffies_stall field with ACCESS_ONCE() Some of the uses of the rcu_state structure's ->jiffies_stall field do not use ACCESS_ONCE(), despite there being unprotected accesses. This commit therefore uses the ACCESS_ONCE() macro to protect this field. Signed-off-by: Iulia Manda <iulia.manda21@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2014-04-29 08:43:45 -07:00
Iulia Manda	9b67122ae3	rcu: Remove unused rcu_data structure field The ->preemptible field in rcu_data is only initialized in the function rcu_init_percpu_data(), and never used. This commit therefore removes this field. Signed-off-by: Iulia Manda <iulia.manda21@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2014-04-29 08:43:38 -07:00
Paul E. McKenney	365187fbc0	rcu: Update cpu_needs_another_gp() for futures from non-NOCB CPUs In the old days, the only source of requests for future grace periods was NOCB CPUs. This has changed: CPUs routinely post requests for future grace periods in order to promote power efficiency and reduce OS jitter with minimal impact on grace-period latency. This commit therefore updates cpu_needs_another_gp() to invoke rcu_future_needs_gp() instead of rcu_nocb_needs_gp(). The latter is no longer used, so is now removed. This commit also adds tracing for the irq_work_queue() wakeup case. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2014-04-29 08:43:32 -07:00
Paul E. McKenney	83ebe63ead	rcu: Print negatives for stall-warning counter wraparound The print_other_cpu_stall() and print_cpu_stall() functions print grace-period numbers using an unsigned format, which means that the number one less than zero is a very large number. This commit therefore causes these numbers to be printed with a signed format in order to improve readability of the RCU CPU stall-warning output. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2014-04-29 08:43:26 -07:00
Paul E. McKenney	91dc95427a	rcu: Protect ->gp_flags accesses with ACCESS_ONCE() A number of ->gp_flags accesses don't have ACCESS_ONCE(), but all of the can race against other loads or stores. This commit therefore applies ACCESS_ONCE() to the unprotected ->gp_flags accesses. Reported-by: Alexey Roytman <alexey.roytman@oracle.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2014-04-29 08:42:31 -07:00
Peter Zijlstra	4e857c58ef	arch: Mass conversion of smp_mb__() Mostly scripted conversion of the smp_mb__ barriers. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-18 14:20:48 +02:00
Paul E. McKenney	765a3f4fed	rcu: Provide grace-period piggybacking API The following pattern is currently not well supported by RCU: 1. Make data element inaccessible to RCU readers. 2. Do work that probably lasts for more than one grace period. 3. Do something to make sure RCU readers in flight before #1 above have completed. Here are some things that could currently be done: a. Do a synchronize_rcu() unconditionally at either #1 or #3 above. This works, but imposes needless work and latency. b. Post an RCU callback at #1 above that does a wakeup, then wait for the wakeup at #3. This works well, but likely results in an extra unneeded grace period. Open-coding this is also a bit more semi-tricky code than would be good. This commit therefore adds get_state_synchronize_rcu() and cond_synchronize_rcu() APIs. Call get_state_synchronize_rcu() at #1 above and pass its return value to cond_synchronize_rcu() at #3 above. This results in a call to synchronize_rcu() if no grace period has elapsed between #1 and #3, but requires only a load, comparison, and memory barrier if a full grace period did elapse. Requested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org>	2014-03-20 17:12:25 -07:00
Paul E. McKenney	322efba5b6	Merge branches 'doc.2014.02.24a', 'fixes.2014.02.26a' and 'rt.2014.02.17b' into HEAD doc.2014.02.24a: Documentation changes fixes.2014.02.26a: Miscellaneous fixes rt.2014.02.17b: Response-time-related changes	2014-02-26 06:36:09 -08:00
Paul Gortmaker	5cb5c6e18f	rcu: Ensure kernel/rcu/rcu.h can be sourced/used stand-alone The kbuild test bot uncovered an implicit dependence on the trace header being present before rcu.h in ia64 allmodconfig that looks like this: In file included from kernel/ksysfs.c:22:0: kernel/rcu/rcu.h: In function '__rcu_reclaim': kernel/rcu/rcu.h:107:3: error: implicit declaration of function 'trace_rcu_invoke_kfree_callback' [-Werror=implicit-function-declaration] kernel/rcu/rcu.h:112:3: error: implicit declaration of function 'trace_rcu_invoke_callback' [-Werror=implicit-function-declaration] cc1: some warnings being treated as errors Looking at other rcu.h users, we can find that they all were sourcing the trace header in advance of rcu.h itself, as seen in the context of this diff. There were also some inconsistencies as to whether it was or wasn't sourced based on the parent tracing Kconfig. Rather than "fix" it at each use site, and have inconsistent use based on whether "#ifdef CONFIG_RCU_TRACE" was used or not, lets just source the trace header just once, in the actual consumer of it, which is rcu.h itself. We include it unconditionally, as build testing shows us that is a hard requirement for some files. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2014-02-26 06:35:18 -08:00
Paul E. McKenney	ffa83fb565	rcu: Optimize rcu_needs_cpu() for RCU_NOCB_CPU_ALL If CONFIG_RCU_NOCB_CPU_ALL=y, then rcu_needs_cpu() will always return false, however, the current version nevertheless checks for RCU callbacks. This commit therefore creates a static inline implementation of rcu_needs_cpu() that unconditionally returns false when CONFIG_RCU_NOCB_CPU_ALL=y. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2014-02-17 16:03:09 -08:00
Paul E. McKenney	cb1e78cfa2	rcu: Remove ACCESS_ONCE() from jiffies Because jiffies is one of a very few variables marked "volatile", there is no need to use ACCESS_ONCE() when accessing it. This commit therefore removes the redundant ACCESS_ONCE() wrappers. Reported by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2014-02-17 15:01:42 -08:00
Paul E. McKenney	87de1cfdc5	rcu: Stop tracking FSF's postal address All of the RCU source files have the usual GPL header, which contains a long-obsolete postal address for FSF. To avoid the need to track the FSF office's movements, this commit substitutes the URL where GPL may be found. Reported-by: Greg KH <gregkh@linuxfoundation.org> Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2014-02-17 15:01:37 -08:00
Paul E. McKenney	3660c2813f	rcu: Add ACCESS_ONCE() to ->n_force_qs_lh accesses The ->n_force_qs_lh field is accessed without the benefit of any synchronization, so this commit adds the needed ACCESS_ONCE() wrappers. Yes, increments to ->n_force_qs_lh can be lost, but contention should be low and the field is strictly statistical in nature, so this is not a problem. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2014-02-17 15:01:10 -08:00
Linus Torvalds	a693c46e14	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: - add RCU torture scripts/tooling - static analysis improvements - update RCU documentation - miscellaneous fixes * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) rcu: Remove "extern" from function declarations in kernel/rcu/rcu.h rcu: Remove "extern" from function declarations in include/linux/rcu.h rcu/torture: Dynamically allocate SRCU output buffer to avoid overflow rcu: Don't activate RCU core on NO_HZ_FULL CPUs rcu: Warn on allegedly impossible rcu_read_unlock_special() from irq rcu: Add an RCU_INITIALIZER for global RCU-protected pointers rcu: Make rcu_assign_pointer's assignment volatile and type-safe bonding: Use RCU_INIT_POINTER() for better overhead and for sparse rcu: Add comment on evaluate-once properties of rcu_assign_pointer(). rcu: Provide better diagnostics for blocking in RCU callback functions rcu: Improve SRCU's grace-period comments rcu: Fix CONFIG_RCU_FANOUT_EXACT for odd fanout/leaf values rcu: Fix coccinelle warnings rcutorture: Stop tracking FSF's postal address rcutorture: Move checkarg to functions.sh rcutorture: Flag errors and warnings with color coding rcutorture: Record results from repeated runs of the same test scenario rcutorture: Test summary at end of run with less chattiness rcutorture: Update comment in kvm.sh listing typical RCU trace events rcutorture: Add tracing-enabled version of TREE08 ...	2014-01-20 10:25:12 -08:00
Paul E. McKenney	6303b9c87d	rcu: Apply smp_mb__after_unlock_lock() to preserve grace periods RCU must ensure that there is the equivalent of a full memory barrier between any memory access preceding grace period and any memory access following that same grace period, regardless of which CPU(s) happen to execute the two memory accesses. Therefore, downgrading UNLOCK+LOCK to no longer imply a full memory barrier requires some adjustments to RCU. This commit therefore adds smp_mb__after_unlock_lock() invocations as needed after the RCU lock acquisitions that need to be part of a full-memory-barrier UNLOCK+LOCK. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386799151-2219-7-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-16 11:36:16 +01:00
Paul E. McKenney	a096932f0c	rcu: Don't activate RCU core on NO_HZ_FULL CPUs Whenever a CPU receives a scheduling-clock interrupt, RCU checks to see if the RCU core needs anything from this CPU. If so, RCU raises RCU_SOFTIRQ to carry out any needed processing. This approach has worked well historically, but it is undesirable on NO_HZ_FULL CPUs. Such CPUs are expected to spend almost all of their time in userspace, so that scheduling-clock interrupts can be disabled while there is only one runnable task on the CPU in question. Unfortunately, raising any softirq has the potential to wake up ksoftirqd, which would provide the second runnable task on that CPU, preventing disabling of scheduling-clock interrupts. What is needed instead is for RCU to leave NO_HZ_FULL CPUs alone, relying on the grace-period kthreads' quiescent-state forcing to do any needed RCU work on behalf of those CPUs. This commit therefore refrains from raising RCU_SOFTIRQ on any NO_HZ_FULL CPUs during any grace periods that have been in effect for less than one second. The one-second limit handles the case where an inappropriate workload is running on a NO_HZ_FULL CPU that features lots of scheduling-clock interrupts, but no idle or userspace time. Reported-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Mike Galbraith <bitbucket@online.de> Toasted-by: Frederic Weisbecker <fweisbec@gmail.com>	2013-12-12 12:34:15 -08:00
Paul E. McKenney	04f34650ca	rcu: Fix CONFIG_RCU_FANOUT_EXACT for odd fanout/leaf values Each element of the rcu_state structure's ->levelspread[] array is intended to contain the per-level fanout, where the zero-th element corresponds to the root of the rcu_node tree, and the last element corresponds to the leaves. In the CONFIG_RCU_FANOUT_EXACT case, this means that the last element should be filled in from CONFIG_RCU_FANOUT_LEAF (or from the rcu_fanout_leaf boot parameter, if provided) and that the remaining elements should be filled in from CONFIG_RCU_FANOUT. Unfortunately, the current code in rcu_init_levelspread() takes the opposite approach, placing CONFIG_RCU_FANOUT_LEAF in the zero-th element and CONFIG_RCU_FANOUT in the remaining elements. For typical power-of-two values, this generates odd but functional rcu_node trees. However, other values, for example CONFIG_RCU_FANOUT=3 and CONFIG_RCU_FANOUT_LEAF=2, generate trees that can leave some CPUs out of the grace-period computation, resulting in too-short grace periods and therefore a broken RCU implementation. This commit therefore fixes rcu_init_levelspread() to set the last ->levelspread[] array element from CONFIG_RCU_FANOUT_LEAF and the remaining elements from CONFIG_RCU_FANOUT, thus generating the intended rcu_node trees. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2013-12-09 15:12:38 -08:00
Fengguang Wu	f6f7ee9af7	rcu: Fix coccinelle warnings This commit fixes the following coccinelle warning: kernel/rcu/tree.c:712:9-10: WARNING: return of 0/1 in function 'rcu_lockdep_current_cpu_online' with return type bool Return statements in functions returning bool should use true/false instead of 1/0. Generated by: coccinelle/misc/boolreturn.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2013-12-09 15:12:25 -08:00
Paul E. McKenney	3947909814	rcu: Let the world know when RCU adjusts its geometry Some RCU bugs have been specific to the layout of the rcu_node tree, but RCU will silently adjust the tree at boot time if appropriate. This obscures valuable debugging information, so print a message when this happens. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2013-12-03 10:10:19 -08:00
Paul E. McKenney	3a5924052a	rcu: Allow task-level idle entry/exit nesting The current task-level idle entry/exit code forces an entry/exit on each call, regardless of the nesting level. This commit therefore properly accounts for nesting. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>	2013-12-03 10:10:19 -08:00
Paul E. McKenney	96d3fd0d31	rcu: Break call_rcu() deadlock involving scheduler and perf Dave Jones got the following lockdep splat: > ====================================================== > [ INFO: possible circular locking dependency detected ] > 3.12.0-rc3+ #92 Not tainted > ------------------------------------------------------- > trinity-child2/15191 is trying to acquire lock: > (&rdp->nocb_wq){......}, at: [<ffffffff8108ff43>] __wake_up+0x23/0x50 > > but task is already holding lock: > (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #3 (&ctx->lock){-.-...}: > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80 > [<ffffffff811500ff>] __perf_event_task_sched_out+0x2df/0x5e0 > [<ffffffff81091b83>] perf_event_task_sched_out+0x93/0xa0 > [<ffffffff81732052>] __schedule+0x1d2/0xa20 > [<ffffffff81732f30>] preempt_schedule_irq+0x50/0xb0 > [<ffffffff817352b6>] retint_kernel+0x26/0x30 > [<ffffffff813eed04>] tty_flip_buffer_push+0x34/0x50 > [<ffffffff813f0504>] pty_write+0x54/0x60 > [<ffffffff813e900d>] n_tty_write+0x32d/0x4e0 > [<ffffffff813e5838>] tty_write+0x158/0x2d0 > [<ffffffff811c4850>] vfs_write+0xc0/0x1f0 > [<ffffffff811c52cc>] SyS_write+0x4c/0xa0 > [<ffffffff8173d4e4>] tracesys+0xdd/0xe2 > > -> #2 (&rq->lock){-.-.-.}: > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80 > [<ffffffff810980b2>] wake_up_new_task+0xc2/0x2e0 > [<ffffffff81054336>] do_fork+0x126/0x460 > [<ffffffff81054696>] kernel_thread+0x26/0x30 > [<ffffffff8171ff93>] rest_init+0x23/0x140 > [<ffffffff81ee1e4b>] start_kernel+0x3f6/0x403 > [<ffffffff81ee1571>] x86_64_start_reservations+0x2a/0x2c > [<ffffffff81ee1664>] x86_64_start_kernel+0xf1/0xf4 > > -> #1 (&p->pi_lock){-.-.-.}: > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90 > [<ffffffff810979d1>] try_to_wake_up+0x31/0x350 > [<ffffffff81097d62>] default_wake_function+0x12/0x20 > [<ffffffff81084af8>] autoremove_wake_function+0x18/0x40 > [<ffffffff8108ea38>] __wake_up_common+0x58/0x90 > [<ffffffff8108ff59>] __wake_up+0x39/0x50 > [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0 > [<ffffffff81111450>] __call_rcu+0x140/0x820 > [<ffffffff81111b8d>] call_rcu+0x1d/0x20 > [<ffffffff81093697>] cpu_attach_domain+0x287/0x360 > [<ffffffff81099d7e>] build_sched_domains+0xe5e/0x10a0 > [<ffffffff81efa7fc>] sched_init_smp+0x3b7/0x47a > [<ffffffff81ee1f4e>] kernel_init_freeable+0xf6/0x202 > [<ffffffff817200be>] kernel_init+0xe/0x190 > [<ffffffff8173d22c>] ret_from_fork+0x7c/0xb0 > > -> #0 (&rdp->nocb_wq){......}: > [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0 > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90 > [<ffffffff8108ff43>] __wake_up+0x23/0x50 > [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0 > [<ffffffff81111450>] __call_rcu+0x140/0x820 > [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30 > [<ffffffff81149abf>] put_ctx+0x4f/0x70 > [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230 > [<ffffffff81056b8d>] do_exit+0x30d/0xcc0 > [<ffffffff8105893c>] do_group_exit+0x4c/0xc0 > [<ffffffff810589c4>] SyS_exit_group+0x14/0x20 > [<ffffffff8173d4e4>] tracesys+0xdd/0xe2 > > other info that might help us debug this: > > Chain exists of: > &rdp->nocb_wq --> &rq->lock --> &ctx->lock > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&ctx->lock); > lock(&rq->lock); > lock(&ctx->lock); > lock(&rdp->nocb_wq); > > * DEADLOCK * > > 1 lock held by trinity-child2/15191: > #0: (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230 > > stack backtrace: > CPU: 2 PID: 15191 Comm: trinity-child2 Not tainted 3.12.0-rc3+ #92 > ffffffff82565b70 ffff880070c2dbf8 ffffffff8172a363 ffffffff824edf40 > ffff880070c2dc38 ffffffff81726741 ffff880070c2dc90 ffff88022383b1c0 > ffff88022383aac0 0000000000000000 ffff88022383b188 ffff88022383b1c0 > Call Trace: > [<ffffffff8172a363>] dump_stack+0x4e/0x82 > [<ffffffff81726741>] print_circular_bug+0x200/0x20f > [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0 > [<ffffffff810c6439>] ? get_lock_stats+0x19/0x60 > [<ffffffff8100b2f4>] ? native_sched_clock+0x24/0x80 > [<ffffffff810cc243>] lock_acquire+0x93/0x200 > [<ffffffff8108ff43>] ? __wake_up+0x23/0x50 > [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90 > [<ffffffff8108ff43>] ? __wake_up+0x23/0x50 > [<ffffffff8108ff43>] __wake_up+0x23/0x50 > [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0 > [<ffffffff81111450>] __call_rcu+0x140/0x820 > [<ffffffff8109bc8f>] ? local_clock+0x3f/0x50 > [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30 > [<ffffffff81149abf>] put_ctx+0x4f/0x70 > [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230 > [<ffffffff81056b8d>] do_exit+0x30d/0xcc0 > [<ffffffff810c9af5>] ? trace_hardirqs_on_caller+0x115/0x1e0 > [<ffffffff810c9bcd>] ? trace_hardirqs_on+0xd/0x10 > [<ffffffff8105893c>] do_group_exit+0x4c/0xc0 > [<ffffffff810589c4>] SyS_exit_group+0x14/0x20 > [<ffffffff8173d4e4>] tracesys+0xdd/0xe2 The underlying problem is that perf is invoking call_rcu() with the scheduler locks held, but in NOCB mode, call_rcu() will with high probability invoke the scheduler -- which just might want to use its locks. The reason that call_rcu() needs to invoke the scheduler is to wake up the corresponding rcuo callback-offload kthread, which does the job of starting up a grace period and invoking the callbacks afterwards. One solution (championed on a related problem by Lai Jiangshan) is to simply defer the wakeup to some point where scheduler locks are no longer held. Since we don't want to unnecessarily incur the cost of such deferral, the task before us is threefold: 1. Determine when it is likely that a relevant scheduler lock is held. 2. Defer the wakeup in such cases. 3. Ensure that all deferred wakeups eventually happen, preferably sooner rather than later. We use irqs_disabled_flags() as a proxy for relevant scheduler locks being held. This works because the relevant locks are always acquired with interrupts disabled. We may defer more often than needed, but that is at least safe. The wakeup deferral is tracked via a new field in the per-CPU and per-RCU-flavor rcu_data structure, namely ->nocb_defer_wakeup. This flag is checked by the RCU core processing. The __rcu_pending() function now checks this flag, which causes rcu_check_callbacks() to initiate RCU core processing at each scheduling-clock interrupt where this flag is set. Of course this is not sufficient because scheduling-clock interrupts are often turned off (the things we used to be able to count on!). So the flags are also checked on entry to any state that RCU considers to be idle, which includes both NO_HZ_IDLE idle state and NO_HZ_FULL user-mode-execution state. This approach should allow call_rcu() to be invoked regardless of what locks you might be holding, the key word being "should". Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org>	2013-12-03 10:10:18 -08:00
Paul E. McKenney	78e4bc34e5	rcu: Fix and comment ordering around wait_event() It is all too easy to forget that wait_event() does not necessarily imply a full memory barrier. The case where it does not is where the condition transitions to true just as wait_event() starts execution. This is actually a feature: The standard use of wait_event() involves locking, in which case the locks provide the needed ordering (you hold a lock across the wake_up() and acquire that same lock after wait_event() returns). Given that I did forget that wait_event() does not necessarily imply a full memory barrier in one case, this commit fixes that case. This commit also adds comments calling out the placement of existing memory barriers relied on by wait_event() calls. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2013-12-03 10:10:18 -08:00
Paul E. McKenney	6193c76aba	rcu: Kick CPU halfway to RCU CPU stall warning When an RCU CPU stall warning occurs, the CPU invokes resched_cpu() on itself. This can help move the grace period forward in some situations, but it would be even better to do this -before- the RCU CPU stall warning. This commit therefore causes resched_cpu() to be called every five jiffies once the system is halfway to an RCU CPU stall warning. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2013-12-03 10:10:18 -08:00
Linus Torvalds	b29c8306a3	This batch of changes is mostly clean ups and small bug fixes. The only real feature that was added this release is from Namhyung Kim, who introduced "set_graph_notrace" filter that lets you run the function graph tracer and not trace particular functions and their call chain. Tom Zanussi added some updates to the ftrace multibuffer tracing that made it more consistent with the top level tracing. One of the fixes for perf function tracing required an API change in RCU; the addition of "rcu_is_watching()". As Paul McKenney is pushing that change in this release too, he gave me a branch that included all the changes to get that working, and I pulled that into my tree in order to complete the perf function tracing fix. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQEcBAABAgAGBQJSgX5SAAoJEKQekfcNnQGulUAH/jORqJrKaNAulmZ314VsAqfa zMtF5UAAPf7kqc3AN/jtFrhJUNEfxWOo7A4r0FsM/rKdWJF+98GA6aqYVD+XoWFt +36fg1enxbXUjixQ96Uh+o1+BJUgYDqljuWzqSu/oiXWfWwl8+WL4kcbhb+V9WcF SpdzLCWVZRfhyDiN3+0zvyQ8RSG2Pd7CWn9zroI0e4sxGo0Ki6JUnIcXtZGOBDOQ IIZdjXvGSfpJ+3u3XvRPXJcltRCtOsVWxYzrmvRlmHDW5QMe1+WmmrlojTePrLaJ xn8+3WINqetAR+ZQnazbpt1XzJzKa8QtFgpiN0kT6qL7cg3N1Owc4vLGohl7wok= =Nesf -----END PGP SIGNATURE----- Merge tag 'trace-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing update from Steven Rostedt: "This batch of changes is mostly clean ups and small bug fixes. The only real feature that was added this release is from Namhyung Kim, who introduced "set_graph_notrace" filter that lets you run the function graph tracer and not trace particular functions and their call chain. Tom Zanussi added some updates to the ftrace multibuffer tracing that made it more consistent with the top level tracing. One of the fixes for perf function tracing required an API change in RCU; the addition of "rcu_is_watching()". As Paul McKenney is pushing that change in this release too, he gave me a branch that included all the changes to get that working, and I pulled that into my tree in order to complete the perf function tracing fix" * tag 'trace-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Add rcu annotation for syscall trace descriptors tracing: Do not use signed enums with unsigned long long in fgragh output tracing: Remove unused function ftrace_off_permanent() tracing: Do not assign filp->private_data to freed memory tracing: Add helper function tracing_is_disabled() tracing: Open tracer when ftrace_dump_on_oops is used tracing: Add support for SOFT_DISABLE to syscall events tracing: Make register/unregister_ftrace_command __init tracing: Update event filters for multibuffer recordmcount.pl: Add support for __fentry__ ftrace: Have control op function callback only trace when RCU is watching rcu: Do not trace rcu_is_watching() functions ftrace/x86: skip over the breakpoint for ftrace caller trace/trace_stat: use rbtree postorder iteration helper instead of opencoding ftrace: Add set_graph_notrace filter ftrace: Narrow down the protected area of graph_lock ftrace: Introduce struct ftrace_graph_data ftrace: Get rid of ftrace_graph_filter_enabled tracing: Fix potential out-of-bounds in trace_get_user() tracing: Show more exact help information about snapshot	2013-11-16 12:23:18 -08:00
Linus Torvalds	39cf275a1a	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The main changes in this cycle are: - (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik van Riel, Peter Zijlstra et al. Yay! - optimize preemption counter handling: merge the NEED_RESCHED flag into the preempt_count variable, by Peter Zijlstra. - wait.h fixes and code reorganization from Peter Zijlstra - cfs_bandwidth fixes from Ben Segall - SMP load-balancer cleanups from Peter Zijstra - idle balancer improvements from Jason Low - other fixes and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits) ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED stop_machine: Fix race between stop_two_cpus() and stop_cpus() sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus sched: Fix asymmetric scheduling for POWER7 sched: Move completion code from core.c to completion.c sched: Move wait code from core.c to wait.c sched: Move wait.c into kernel/sched/ sched/wait: Fix __wait_event_interruptible_lock_irq_timeout() sched: Avoid throttle_cfs_rq() racing with period_timer stopping sched: Guarantee new group-entities always have weight sched: Fix hrtimer_cancel()/rq->lock deadlock sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining sched: Fix race on toggling cfs_bandwidth_used sched: Remove extra put_online_cpus() inside sched_setaffinity() sched/rt: Fix task_tick_rt() comment sched/wait: Fix build breakage sched/wait: Introduce prepare_to_wait_event() sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too sched: Remove get_online_cpus() usage sched: Fix race in migrate_swap_stop() ...	2013-11-12 10:20:12 +09:00
Paul E. McKenney	4102adab91	rcu: Move RCU-related source code to kernel/rcu directory Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org>	2013-10-15 12:53:31 -07:00

... 6 7 8 9 10 ...

690 Commits