mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2025-01-18 12:36:10 +07:00
doc: Update RCU documentation
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit is contained in:
parent
f99bcb2cdb
commit
4de5f89ef8
@ -23,6 +23,14 @@ over a rather long period of time, but improvements are always welcome!
|
||||
Yet another exception is where the low real-time latency of RCU's
|
||||
read-side primitives is critically important.
|
||||
|
||||
One final exception is where RCU readers are used to prevent
|
||||
the ABA problem (https://en.wikipedia.org/wiki/ABA_problem)
|
||||
for lockless updates. This does result in the mildly
|
||||
counter-intuitive situation where rcu_read_lock() and
|
||||
rcu_read_unlock() are used to protect updates, however, this
|
||||
approach provides the same potential simplifications that garbage
|
||||
collectors do.
|
||||
|
||||
1. Does the update code have proper mutual exclusion?
|
||||
|
||||
RCU does allow -readers- to run (almost) naked, but -writers- must
|
||||
@ -40,7 +48,9 @@ over a rather long period of time, but improvements are always welcome!
|
||||
explain how this single task does not become a major bottleneck on
|
||||
big multiprocessor machines (for example, if the task is updating
|
||||
information relating to itself that other tasks can read, there
|
||||
by definition can be no bottleneck).
|
||||
by definition can be no bottleneck). Note that the definition
|
||||
of "large" has changed significantly: Eight CPUs was "large"
|
||||
in the year 2000, but a hundred CPUs was unremarkable in 2017.
|
||||
|
||||
2. Do the RCU read-side critical sections make proper use of
|
||||
rcu_read_lock() and friends? These primitives are needed
|
||||
@ -55,6 +65,12 @@ over a rather long period of time, but improvements are always welcome!
|
||||
Disabling of preemption can serve as rcu_read_lock_sched(), but
|
||||
is less readable.
|
||||
|
||||
Letting RCU-protected pointers "leak" out of an RCU read-side
|
||||
critical section is every bid as bad as letting them leak out
|
||||
from under a lock. Unless, of course, you have arranged some
|
||||
other means of protection, such as a lock or a reference count
|
||||
-before- letting them out of the RCU read-side critical section.
|
||||
|
||||
3. Does the update code tolerate concurrent accesses?
|
||||
|
||||
The whole point of RCU is to permit readers to run without
|
||||
@ -78,10 +94,10 @@ over a rather long period of time, but improvements are always welcome!
|
||||
|
||||
This works quite well, also.
|
||||
|
||||
c. Make updates appear atomic to readers. For example,
|
||||
c. Make updates appear atomic to readers. For example,
|
||||
pointer updates to properly aligned fields will
|
||||
appear atomic, as will individual atomic primitives.
|
||||
Sequences of perations performed under a lock will -not-
|
||||
Sequences of operations performed under a lock will -not-
|
||||
appear to be atomic to RCU readers, nor will sequences
|
||||
of multiple atomic primitives.
|
||||
|
||||
@ -168,8 +184,8 @@ over a rather long period of time, but improvements are always welcome!
|
||||
|
||||
5. If call_rcu(), or a related primitive such as call_rcu_bh(),
|
||||
call_rcu_sched(), or call_srcu() is used, the callback function
|
||||
must be written to be called from softirq context. In particular,
|
||||
it cannot block.
|
||||
will be called from softirq context. In particular, it cannot
|
||||
block.
|
||||
|
||||
6. Since synchronize_rcu() can block, it cannot be called from
|
||||
any sort of irq context. The same rule applies for
|
||||
@ -178,11 +194,14 @@ over a rather long period of time, but improvements are always welcome!
|
||||
synchronize_sched_expedite(), and synchronize_srcu_expedited().
|
||||
|
||||
The expedited forms of these primitives have the same semantics
|
||||
as the non-expedited forms, but expediting is both expensive
|
||||
and unfriendly to real-time workloads. Use of the expedited
|
||||
primitives should be restricted to rare configuration-change
|
||||
operations that would not normally be undertaken while a real-time
|
||||
workload is running.
|
||||
as the non-expedited forms, but expediting is both expensive and
|
||||
(with the exception of synchronize_srcu_expedited()) unfriendly
|
||||
to real-time workloads. Use of the expedited primitives should
|
||||
be restricted to rare configuration-change operations that would
|
||||
not normally be undertaken while a real-time workload is running.
|
||||
However, real-time workloads can use rcupdate.rcu_normal kernel
|
||||
boot parameter to completely disable expedited grace periods,
|
||||
though this might have performance implications.
|
||||
|
||||
In particular, if you find yourself invoking one of the expedited
|
||||
primitives repeatedly in a loop, please do everyone a favor:
|
||||
@ -193,11 +212,6 @@ over a rather long period of time, but improvements are always welcome!
|
||||
of the system, especially to real-time workloads running on
|
||||
the rest of the system.
|
||||
|
||||
In addition, it is illegal to call the expedited forms from
|
||||
a CPU-hotplug notifier, or while holding a lock that is acquired
|
||||
by a CPU-hotplug notifier. Failing to observe this restriction
|
||||
will result in deadlock.
|
||||
|
||||
7. If the updater uses call_rcu() or synchronize_rcu(), then the
|
||||
corresponding readers must use rcu_read_lock() and
|
||||
rcu_read_unlock(). If the updater uses call_rcu_bh() or
|
||||
@ -321,7 +335,7 @@ over a rather long period of time, but improvements are always welcome!
|
||||
Similarly, disabling preemption is not an acceptable substitute
|
||||
for rcu_read_lock(). Code that attempts to use preemption
|
||||
disabling where it should be using rcu_read_lock() will break
|
||||
in real-time kernel builds.
|
||||
in CONFIG_PREEMPT=y kernel builds.
|
||||
|
||||
If you want to wait for interrupt handlers, NMI handlers, and
|
||||
code under the influence of preempt_disable(), you instead
|
||||
@ -356,23 +370,22 @@ over a rather long period of time, but improvements are always welcome!
|
||||
not the case, a self-spawning RCU callback would prevent the
|
||||
victim CPU from ever going offline.)
|
||||
|
||||
14. SRCU (srcu_read_lock(), srcu_read_unlock(), srcu_dereference(),
|
||||
synchronize_srcu(), synchronize_srcu_expedited(), and call_srcu())
|
||||
may only be invoked from process context. Unlike other forms of
|
||||
RCU, it -is- permissible to block in an SRCU read-side critical
|
||||
section (demarked by srcu_read_lock() and srcu_read_unlock()),
|
||||
hence the "SRCU": "sleepable RCU". Please note that if you
|
||||
don't need to sleep in read-side critical sections, you should be
|
||||
using RCU rather than SRCU, because RCU is almost always faster
|
||||
and easier to use than is SRCU.
|
||||
14. Unlike other forms of RCU, it -is- permissible to block in an
|
||||
SRCU read-side critical section (demarked by srcu_read_lock()
|
||||
and srcu_read_unlock()), hence the "SRCU": "sleepable RCU".
|
||||
Please note that if you don't need to sleep in read-side critical
|
||||
sections, you should be using RCU rather than SRCU, because RCU
|
||||
is almost always faster and easier to use than is SRCU.
|
||||
|
||||
Also unlike other forms of RCU, explicit initialization
|
||||
and cleanup is required via init_srcu_struct() and
|
||||
cleanup_srcu_struct(). These are passed a "struct srcu_struct"
|
||||
that defines the scope of a given SRCU domain. Once initialized,
|
||||
the srcu_struct is passed to srcu_read_lock(), srcu_read_unlock()
|
||||
synchronize_srcu(), synchronize_srcu_expedited(), and call_srcu().
|
||||
A given synchronize_srcu() waits only for SRCU read-side critical
|
||||
Also unlike other forms of RCU, explicit initialization and
|
||||
cleanup is required either at build time via DEFINE_SRCU()
|
||||
or DEFINE_STATIC_SRCU() or at runtime via init_srcu_struct()
|
||||
and cleanup_srcu_struct(). These last two are passed a
|
||||
"struct srcu_struct" that defines the scope of a given
|
||||
SRCU domain. Once initialized, the srcu_struct is passed
|
||||
to srcu_read_lock(), srcu_read_unlock() synchronize_srcu(),
|
||||
synchronize_srcu_expedited(), and call_srcu(). A given
|
||||
synchronize_srcu() waits only for SRCU read-side critical
|
||||
sections governed by srcu_read_lock() and srcu_read_unlock()
|
||||
calls that have been passed the same srcu_struct. This property
|
||||
is what makes sleeping read-side critical sections tolerable --
|
||||
@ -390,10 +403,16 @@ over a rather long period of time, but improvements are always welcome!
|
||||
Therefore, SRCU should be used in preference to rw_semaphore
|
||||
only in extremely read-intensive situations, or in situations
|
||||
requiring SRCU's read-side deadlock immunity or low read-side
|
||||
realtime latency.
|
||||
realtime latency. You should also consider percpu_rw_semaphore
|
||||
when you need lightweight readers.
|
||||
|
||||
Note that, rcu_assign_pointer() relates to SRCU just as it does
|
||||
to other forms of RCU.
|
||||
SRCU's expedited primitive (synchronize_srcu_expedited())
|
||||
never sends IPIs to other CPUs, so it is easier on
|
||||
real-time workloads than is synchronize_rcu_expedited(),
|
||||
synchronize_rcu_bh_expedited() or synchronize_sched_expedited().
|
||||
|
||||
Note that rcu_dereference() and rcu_assign_pointer() relate to
|
||||
SRCU just as they do to other forms of RCU.
|
||||
|
||||
15. The whole point of call_rcu(), synchronize_rcu(), and friends
|
||||
is to wait until all pre-existing readers have finished before
|
||||
@ -435,3 +454,33 @@ over a rather long period of time, but improvements are always welcome!
|
||||
|
||||
These debugging aids can help you find problems that are
|
||||
otherwise extremely difficult to spot.
|
||||
|
||||
18. If you register a callback using call_rcu(), call_rcu_bh(),
|
||||
call_rcu_sched(), or call_srcu(), and pass in a function defined
|
||||
within a loadable module, then it in necessary to wait for
|
||||
all pending callbacks to be invoked after the last invocation
|
||||
and before unloading that module. Note that it is absolutely
|
||||
-not- sufficient to wait for a grace period! The current (say)
|
||||
synchronize_rcu() implementation waits only for all previous
|
||||
callbacks registered on the CPU that synchronize_rcu() is running
|
||||
on, but it is -not- guaranteed to wait for callbacks registered
|
||||
on other CPUs.
|
||||
|
||||
You instead need to use one of the barrier functions:
|
||||
|
||||
o call_rcu() -> rcu_barrier()
|
||||
o call_rcu_bh() -> rcu_barrier_bh()
|
||||
o call_rcu_sched() -> rcu_barrier_sched()
|
||||
o call_srcu() -> srcu_barrier()
|
||||
|
||||
However, these barrier functions are absolutely -not- guaranteed
|
||||
to wait for a grace period. In fact, if there are no call_rcu()
|
||||
callbacks waiting anywhere in the system, rcu_barrier() is within
|
||||
its rights to return immediately.
|
||||
|
||||
So if you need to wait for both an RCU grace period and for
|
||||
all pre-existing call_rcu() callbacks, you will need to execute
|
||||
both rcu_barrier() and synchronize_rcu(), if necessary, using
|
||||
something like workqueues to to execute them concurrently.
|
||||
|
||||
See rcubarrier.txt for more information.
|
||||
|
@ -76,15 +76,12 @@ o I hear that RCU is patented? What is with that?
|
||||
Of these, one was allowed to lapse by the assignee, and the
|
||||
others have been contributed to the Linux kernel under GPL.
|
||||
There are now also LGPL implementations of user-level RCU
|
||||
available (http://lttng.org/?q=node/18).
|
||||
available (http://liburcu.org/).
|
||||
|
||||
o I hear that RCU needs work in order to support realtime kernels?
|
||||
|
||||
This work is largely completed. Realtime-friendly RCU can be
|
||||
enabled via the CONFIG_PREEMPT_RCU kernel configuration
|
||||
parameter. However, work is in progress for enabling priority
|
||||
boosting of preempted RCU read-side critical sections. This is
|
||||
needed if you have CPU-bound realtime threads.
|
||||
Realtime-friendly RCU can be enabled via the CONFIG_PREEMPT_RCU
|
||||
kernel configuration parameter.
|
||||
|
||||
o Where can I find more information on RCU?
|
||||
|
||||
|
@ -263,6 +263,11 @@ Quick Quiz #2: What happens if CPU 0's rcu_barrier_func() executes
|
||||
are delayed for a full grace period? Couldn't this result in
|
||||
rcu_barrier() returning prematurely?
|
||||
|
||||
The current rcu_barrier() implementation is more complex, due to the need
|
||||
to avoid disturbing idle CPUs (especially on battery-powered systems)
|
||||
and the need to minimally disturb non-idle CPUs in real-time systems.
|
||||
However, the code above illustrates the concepts.
|
||||
|
||||
|
||||
rcu_barrier() Summary
|
||||
|
||||
|
@ -276,15 +276,17 @@ o "Free-Block Circulation": Shows the number of torture structures
|
||||
somehow gets incremented farther than it should.
|
||||
|
||||
Different implementations of RCU can provide implementation-specific
|
||||
additional information. For example, SRCU provides the following
|
||||
additional information. For example, Tree SRCU provides the following
|
||||
additional line:
|
||||
|
||||
srcu-torture: per-CPU(idx=1): 0(0,1) 1(0,1) 2(0,0) 3(0,1)
|
||||
srcud-torture: Tree SRCU per-CPU(idx=0): 0(35,-21) 1(-4,24) 2(1,1) 3(-26,20) 4(28,-47) 5(-9,4) 6(-10,14) 7(-14,11) T(1,6)
|
||||
|
||||
This line shows the per-CPU counter state. The numbers in parentheses are
|
||||
the values of the "old" and "current" counters for the corresponding CPU.
|
||||
The "idx" value maps the "old" and "current" values to the underlying
|
||||
array, and is useful for debugging.
|
||||
This line shows the per-CPU counter state, in this case for Tree SRCU
|
||||
using a dynamically allocated srcu_struct (hence "srcud-" rather than
|
||||
"srcu-"). The numbers in parentheses are the values of the "old" and
|
||||
"current" counters for the corresponding CPU. The "idx" value maps the
|
||||
"old" and "current" values to the underlying array, and is useful for
|
||||
debugging. The final "T" entry contains the totals of the counters.
|
||||
|
||||
|
||||
USAGE
|
||||
@ -304,3 +306,9 @@ checked for such errors. The "rmmod" command forces a "SUCCESS",
|
||||
"FAILURE", or "RCU_HOTPLUG" indication to be printk()ed. The first
|
||||
two are self-explanatory, while the last indicates that while there
|
||||
were no RCU failures, CPU-hotplug problems were detected.
|
||||
|
||||
However, the tools/testing/selftests/rcutorture/bin/kvm.sh script
|
||||
provides better automation, including automatic failure analysis.
|
||||
It assumes a qemu/kvm-enabled platform, and runs guest OSes out of initrd.
|
||||
See tools/testing/selftests/rcutorture/doc/initrd.txt for instructions
|
||||
on setting up such an initrd.
|
||||
|
@ -890,6 +890,8 @@ SRCU: Critical sections Grace period Barrier
|
||||
srcu_read_lock_held
|
||||
|
||||
SRCU: Initialization/cleanup
|
||||
DEFINE_SRCU
|
||||
DEFINE_STATIC_SRCU
|
||||
init_srcu_struct
|
||||
cleanup_srcu_struct
|
||||
|
||||
@ -913,7 +915,8 @@ a. Will readers need to block? If so, you need SRCU.
|
||||
b. What about the -rt patchset? If readers would need to block
|
||||
in an non-rt kernel, you need SRCU. If readers would block
|
||||
in a -rt kernel, but not in a non-rt kernel, SRCU is not
|
||||
necessary.
|
||||
necessary. (The -rt patchset turns spinlocks into sleeplocks,
|
||||
hence this distinction.)
|
||||
|
||||
c. Do you need to treat NMI handlers, hardirq handlers,
|
||||
and code segments with preemption disabled (whether
|
||||
|
Loading…
Reference in New Issue
Block a user