Commit Graph

16 Commits

Author SHA1 Message Date
Paul E. McKenney
4e6ea4ef56 srcu: Make early-boot call_srcu() reuse workqueue lists
Allocating a list_head structure that is almost never used, and, when
used, is used only during early boot (rcu_init() and earlier), is a bit
wasteful.  This commit therefore eliminates that list_head in favor of
the one in the work_struct structure.  This is safe because the work_struct
structure cannot be used until after rcu_init() returns.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-30 16:10:49 -07:00
Paul E. McKenney
e0fcba9ac0 srcu: Make call_srcu() available during very early boot
Event tracing is moving to SRCU in order to take advantage of the fact
that SRCU may be safely used from idle and even offline CPUs.  However,
event tracing can invoke call_srcu() very early in the boot process,
even before workqueue_init_early() is invoked (let alone rcu_init()).
Therefore, call_srcu()'s attempts to queue work fail miserably.

This commit therefore detects this situation, and refrains from attempting
to queue work before rcu_init() time, but does everything else that it
would have done, and in addition, adds the srcu_struct to a global list.
The rcu_init() function now invokes a new srcu_init() function, which
is empty if CONFIG_SRCU=n.  Otherwise, srcu_init() queues work for
each srcu_struct on the list.  This all happens early enough in boot
that there is but a single CPU with interrupts disabled, which allows
synchronization to be dispensed with.

Of course, the queued work won't actually be invoked until after
workqueue_init() is invoked, which happens shortly after the scheduler
is up and running.  This means that although call_srcu() may be invoked
any time after per-CPU variables have been set up, there is still a very
narrow window when synchronize_srcu() won't work, and this window
extends from the time that the scheduler starts until the time that
workqueue_init() returns.  This can be fixed in a manner similar to
the fix for synchronize_rcu_expedited() and friends, but until someone
actually needs to use synchronize_srcu() during this window, this fix
is added churn for no benefit.

Finally, note that Tree SRCU's new srcu_init() function invokes
queue_work() rather than the queue_delayed_work() function that is
invoked post-boot.  The reason is that queue_delayed_work() will (as you
would expect) post a timer, and timers have not yet been initialized.
So use of queue_work() avoids the complaints about use of uninitialized
spinlocks that would otherwise result.  Besides, some delay is already
provide by the aforementioned fact that the queued work won't actually
be invoked until after the scheduler is up and running.

Requested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-30 16:10:19 -07:00
Sebastian Andrzej Siewior
9c80172b90 kernel/SRCU: provide a static initializer
There are macros for static initializer for the three out of four
possible notifier types, that are:
	ATOMIC_NOTIFIER_HEAD()
	BLOCKING_NOTIFIER_HEAD()
	RAW_NOTIFIER_HEAD()

This patch provides a static initilizer for the forth type to make it
complete.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-05-30 10:11:34 +02:00
Paul E. McKenney
d633198088 srcu: Prohibit call_srcu() use under raw spinlocks
Invoking queue_delayed_work() while holding a raw spinlock is forbidden
in -rt kernels, which is exactly what __call_srcu() does, indirectly via
srcu_funnel_gp_start().  This commit therefore downgrades Tree SRCU's
locking from raw to non-raw spinlocks, which works because call_srcu()
is not ever called while holding a raw spinlock.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:52:33 -08:00
Paul E. McKenney
656e7c0c0a Merge branches 'doc.2017.08.17a', 'fixes.2017.08.17a', 'hotplug.2017.07.25b', 'misc.2017.08.17a', 'spin_unlock_wait_no.2017.08.17a', 'srcu.2017.07.27c' and 'torture.2017.07.24c' into HEAD
doc.2017.08.17a: Documentation updates.
fixes.2017.08.17a: RCU fixes.
hotplug.2017.07.25b: CPU-hotplug updates.
misc.2017.08.17a: Miscellaneous fixes outside of RCU (give or take conflicts).
spin_unlock_wait_no.2017.08.17a: Remove spin_unlock_wait().
srcu.2017.07.27c: SRCU updates.
torture.2017.07.24c: Torture-test updates.
2017-08-17 08:10:04 -07:00
Paul E. McKenney
115a1a5285 rcutorture: Move SRCU status printing to SRCU implementations
This commit gets rid of some ugly #ifdefs in rcutorture.c by moving
the SRCU status printing to the SRCU implementations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:04:08 -07:00
Paul E. McKenney
0d8a1e831e srcu: Make process_srcu() be static
The function process_srcu() is not invoked outside of srcutree.c, so
this commit makes it static and drops the EXPORT_SYMBOL_GPL().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-24 16:03:23 -07:00
Paul E. McKenney
a3883df393 srcu: Use rnp->lock wrappers to replace explicit memory barriers
This commit uses TREE RCU's rnp->lock wrappers to replace a few explicit
memory barriers.  This change also has the advantage of making SRCU's
memory-ordering properties be implemented in roughly the same way as they
are in Tree RCU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:38 -07:00
Paul E. McKenney
5a0465e17a srcu: Shrink srcu.h by moving docbook and private function
The call_srcu() docbook entry is currently in include/linux/srcu.h,
which causes needless processing for each include point.  This commit
therefore moves this entry to kernel/rcu/srcutree.c, which the compiler
reads only once.  In addition, the srcu_batches_completed() function is
used only within RCU and its torture-test suites.  This commit therefore
also moves this function's declaration from include/linux/srcutiny.h,
include/linux/srcutree.h, and include/linux/srcuclassic.h to
kernel/rcu/rcu.h.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:35 -07:00
Paul E. McKenney
cad7b38972 rcu: Move torture-related definitions from rcupdate.h to rcu.h
The include/linux/rcupdate.h file contains a number of definitions that
are used only to communicate between rcutorture, rcuperf, and the RCU code
itself.  There is no point in having these definitions exposed globally
throughout the kernel, so this commit moves them to kernel/rcu/rcu.h.
This change has the added benefit of shrinking rcupdate.h.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:28 -07:00
Paul E. McKenney
22607d66bb srcu: Specify auto-expedite holdoff time
On small systems, in the absence of readers, expedited SRCU grace
periods can complete in less than a microsecond.  This means that an
eight-CPU system can have all CPUs doing synchronize_srcu() in a tight
loop and almost always expedite.  This might actually be desirable in
some situations, but in general it is a good way to needlessly burn
CPU cycles.  And in those situations where it is desirable, your friend
is the function synchronize_srcu_expedited().

For other situations, this commit adds a kernel parameter that specifies
a holdoff between completing the last SRCU grace period and auto-expediting
the next.  If the next grace period starts before the holdoff expires,
auto-expediting is disabled.  The holdoff is 50 microseconds by default,
and can be tuned to the desired number of nanoseconds.  A value of zero
disables auto-expediting.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26 16:32:17 -07:00
Paul E. McKenney
1e9a038b7f srcu: Expedited grace periods with reduced memory contention
Commit f60d231a87 ("srcu: Crude control of expedited grace periods")
introduced a per-srcu_struct atomic counter to track outstanding
requests for grace periods.  This works, but represents a memory-contention
bottleneck.  This commit therefore uses the srcu_node combining tree
to remove this bottleneck.

This commit adds new ->srcu_gp_seq_needed_exp fields to the
srcu_data, srcu_node, and srcu_struct structures, which track the
farthest-in-the-future grace period that must be expedited, which in
turn requires that all nearer-term grace periods also be expedited.
Requests for expediting start with the srcu_data structure, run up
through the srcu_node tree, and end at the srcu_struct structure.
Note that it may be necessary to expedite a grace period that just
now started, and this is handled by a new srcu_funnel_exp_start()
function, which is invoked when the grace period itself is already
in its way, but when that grace period was not marked as expedited.

A new srcu_get_delay() function returns zero if there is at least one
expedited SRCU grace period in flight, or SRCU_INTERVAL otherwise.
This function is used to calculate delays:  Normal grace periods
are allowed to extend in order to cover more requests with a given
grace-period computation, which decreases per-request overhead.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26 16:32:16 -07:00
Paul E. McKenney
7f6733c3c6 srcu: Make rcutorture writer stalls print SRCU GP state
In the past, SRCU was simple enough that there was little point in
making the rcutorture writer stall messages print the SRCU grace-period
number state.  With the advent of Tree SRCU, this has changed.  This
commit therefore makes Classic, Tiny, and Tree SRCU report this state
to rcutorture as needed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26 11:23:28 -07:00
Paul E. McKenney
c7e88067c1 srcu: Exact tracking of srcu_data structures containing callbacks
The current Tree SRCU implementation schedules a workqueue for every
srcu_data covered by a given leaf srcu_node structure having callbacks,
even if only one of those srcu_data structures actually contains
callbacks.  This is clearly inefficient for workloads that don't feature
callbacks everywhere all the time.  This commit therefore adds an array
of masks that are used by the leaf srcu_node structures to track exactly
which srcu_data structures contain callbacks.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <efault@gmx.de>
2017-04-26 11:23:12 -07:00
Paul E. McKenney
da915ad5cf srcu: Parallelize callback handling
Peter Zijlstra proposed using SRCU to reduce mmap_sem contention [1,2],
however, there are workloads that could result in a high volume of
concurrent invocations of call_srcu(), which with current SRCU would
result in excessive lock contention on the srcu_struct structure's
->queue_lock, which protects SRCU's callback lists.  This commit therefore
moves SRCU to per-CPU callback lists, thus greatly reducing contention.

Because a given SRCU instance no longer has a single centralized callback
list, starting grace periods and invoking callbacks are both more complex
than in the single-list Classic SRCU implementation.  Starting grace
periods and handling callbacks are now handled using an srcu_node tree
that is in some ways similar to the rcu_node trees used by RCU-bh,
RCU-preempt, and RCU-sched (for example, the srcu_node tree shape is
controlled by exactly the same Kconfig options and boot parameters that
control the shape of the rcu_node tree).

In addition, the old per-CPU srcu_array structure is now named srcu_data
and contains an rcu_segcblist structure named ->srcu_cblist for its
callbacks (and a spinlock to protect this).  The srcu_struct gets
an srcu_gp_seq that is used to associate callback segments with the
corresponding completion-time grace-period number.  These completion-time
grace-period numbers are propagated up the srcu_node tree so that the
grace-period workqueue handler can determine whether additional grace
periods are needed on the one hand and where to look for callbacks that
are ready to be invoked.

The srcu_barrier() function must now wait on all instances of the per-CPU
->srcu_cblist.  Because each ->srcu_cblist is protected by ->lock,
srcu_barrier() can remotely add the needed callbacks.  In theory,
it could also remotely start grace periods, but in practice doing so
is complex and racy.  And interestingly enough, it is never necessary
for srcu_barrier() to start a grace period because srcu_barrier() only
enqueues a callback when a callback is already present--and it turns out
that a grace period has to have already been started for this pre-existing
callback.  Furthermore, it is only the callback that srcu_barrier()
needs to wait on, not any particular grace period.  Therefore, a new
rcu_segcblist_entrain() function enqueues the srcu_barrier() function's
callback into the same segment occupied by the last pre-existing callback
in the list.  The special case where all the pre-existing callbacks are
on a different list (because they are in the process of being invoked)
is handled by enqueuing srcu_barrier()'s callback into the RCU_DONE_TAIL
segment, relying on the done-callbacks check that takes place after all
callbacks are inovked.

Note that the readers use the same algorithm as before.  Note that there
is a separate srcu_idx that tells the readers what counter to increment.
This unfortunately cannot be combined with srcu_gp_seq because they
need to be incremented at different times.

This commit introduces some ugly #ifdefs in rcutorture.  These will go
away when I feel good enough about Tree SRCU to ditch Classic SRCU.

Some crude performance comparisons, courtesy of a quickly hacked rcuperf
asynchronous-grace-period capability:

			Callback Queuing Overhead
			-------------------------
	# CPUS		Classic SRCU	Tree SRCU
	------          ------------    ---------
	     2              0.349 us     0.342 us
	    16             31.66  us     0.4   us
	    41             ---------     0.417 us

The times are the 90th percentiles, a statistic that was chosen to reject
the overheads of the occasional srcu_barrier() call needed to avoid OOMing
the test machine.  The rcuperf test hangs when running Classic SRCU at 41
CPUs, hence the line of dashes.  Despite the hacks to both the rcuperf code
and that statistics, this is a convincing demonstration of Tree SRCU's
performance and scalability advantages.

[1] https://lwn.net/Articles/309030/
[2] https://patchwork.kernel.org/patch/5108281/

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix initialization if synchronize_srcu_expedited() called first. ]
2017-04-21 05:59:26 -07:00
Paul E. McKenney
d8be81735a srcu: Create a tiny SRCU
In response to automated complaints about modifications to SRCU
increasing its size, this commit creates a tiny SRCU that is
used in SMP=n && PREEMPT=n builds.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18 11:38:22 -07:00