Commit Graph

312 Commits

Author SHA1 Message Date
Paul E. McKenney
791875d16e rcu: Eliminate the unused __rcu_is_watching() function
The __rcu_is_watching() function is currently not used, aside from
to implement the rcu_is_watching() function.  This commit therefore
eliminates __rcu_is_watching(), which has the beneficial side-effect
of shrinking include/linux/rcupdate.h a bit.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:30 -07:00
Paul E. McKenney
d0df7a3491 rcu: Move rcupdate.h to new empty-function style
This commit saves a few lines in include/linux/rcupdate.h by moving
to single-line definitions for empty functions, instead of the old
style where the two curly braces each get their own line.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:29 -07:00
Paul E. McKenney
c4cbf9f736 rcu: Remove UINT_CMP_GE() and UINT_CMP_LT()
The UINT_CMP_GE() and UINT_CMP_LT() macros are not used, so this
commit removes them.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:29 -07:00
Paul E. McKenney
cad7b38972 rcu: Move torture-related definitions from rcupdate.h to rcu.h
The include/linux/rcupdate.h file contains a number of definitions that
are used only to communicate between rcutorture, rcuperf, and the RCU code
itself.  There is no point in having these definitions exposed globally
throughout the kernel, so this commit moves them to kernel/rcu/rcu.h.
This change has the added benefit of shrinking rcupdate.h.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:28 -07:00
Paul E. McKenney
25c36329a3 rcu: Move expediting-related access/control out of rcupdate.h
The rcu_gp_is_normal(), rcu_gp_is_expedited(), rcu_expedite_gp(), and
rcu_unexpedite_gp() functions are intended only for use within the
RCU implementation itself -- the sysfs access is what should be used
outside of RCU.  This commit therefore moves the declarations for
these functions to kernel/rcu/rcu.h, and also includes this file into
kernel/rcu/rcutorture.c and kernel/rcu/rcuperf.c.  This also has the
beneficial effect of shrinking rcupdate.c a bit.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:28 -07:00
Paul E. McKenney
3caec62fbb rcu: Move rcu_expedited and rcu_normal externs from rcupdate.h
The rcu_expedited and rcu_normal variables are used only by sysctl
and kernel/rcu/update.c, so it does not make sense to their extern
declarations in rcupdate.h.  This commit therefore moves these
extern declarations to update.c.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:27 -07:00
Paul E. McKenney
a68a2bb28b rcu: Move docbook comments out of rcupdate.h
The include/linux/rcupdate.h file is included by more than 200
files, so shrinking it should provide some build-time benefits.
This commit therefore moves several docbook comments from rcupdate.h to
kernel/rcu/update.c, kernel/rcu/tree.c, and kernel/rcu/tree_plugin.h, thus
reducing the number of times that the compiler has to scan these comments.
This likely provides only a small benefit, but every little bit helps.

This commit also fixes a malformed bulleted list noted by the 0day
Test Robot.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 18:52:27 -07:00
Paul E. McKenney
59d80fd835 rcu: Print out rcupdate.c non-default boot-time settings
This commit adds a rcupdate_announce_bootup_oddness() function to
print out non-default values of significant kernel boot parameter
settings to aid in debugging.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08 08:25:35 -07:00
Linus Torvalds
de4d195308 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main changes are:

   - Debloat RCU headers

   - Parallelize SRCU callback handling (plus overlapping patches)

   - Improve the performance of Tree SRCU on a CPU-hotplug stress test

   - Documentation updates

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
  rcu: Open-code the rcu_cblist_n_lazy_cbs() function
  rcu: Open-code the rcu_cblist_n_cbs() function
  rcu: Open-code the rcu_cblist_empty() function
  rcu: Separately compile large rcu_segcblist functions
  srcu: Debloat the <linux/rcu_segcblist.h> header
  srcu: Adjust default auto-expediting holdoff
  srcu: Specify auto-expedite holdoff time
  srcu: Expedite first synchronize_srcu() when idle
  srcu: Expedited grace periods with reduced memory contention
  srcu: Make rcutorture writer stalls print SRCU GP state
  srcu: Exact tracking of srcu_data structures containing callbacks
  srcu: Make SRCU be built by default
  srcu: Fix Kconfig botch when SRCU not selected
  rcu: Make non-preemptive schedule be Tasks RCU quiescent state
  srcu: Expedite srcu_schedule_cbs_snp() callback invocation
  srcu: Parallelize callback handling
  kvm: Move srcu_struct fields to end of struct kvm
  rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
  rcu: Use true/false in assignment to bool
  rcu: Use bool value directly
  ...
2017-05-10 10:30:46 -07:00
Paul E. McKenney
bcbfdd01dc rcu: Make non-preemptive schedule be Tasks RCU quiescent state
Currently, a call to schedule() acts as a Tasks RCU quiescent state
only if a context switch actually takes place.  However, just the
call to schedule() guarantees that the calling task has moved off of
whatever tracing trampoline that it might have been one previously.
This commit therefore plumbs schedule()'s "preempt" parameter into
rcu_note_context_switch(), which then records the Tasks RCU quiescent
state, but only if this call to schedule() was -not- due to a preemption.

To avoid adding overhead to the common-case context-switch path,
this commit hides the rcu_note_context_switch() check under an existing
non-common-case check.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-21 05:59:27 -07:00
Paul E. McKenney
77e5849688 rcu: Make arch select smp_mb__after_unlock_lock() strength
The definition of smp_mb__after_unlock_lock() is currently smp_mb()
for CONFIG_PPC and a no-op otherwise.  It would be better to instead
provide an architecture-selectable Kconfig option, and select the
strength of smp_mb__after_unlock_lock() based on that option.  This
commit therefore creates ARCH_WEAK_RELEASE_ACQUIRE, has PPC select it,
and bases the definition of smp_mb__after_unlock_lock() on this new
ARCH_WEAK_RELEASE_ACQUIRE Kconfig option.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Boqun Feng <boqun.feng@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: <linuxppc-dev@lists.ozlabs.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-04-18 11:20:15 -07:00
Steven Rostedt (VMware)
03ecd3f48e rcu/tracing: Add rcu_disabled to denote when rcu_irq_enter() will not work
Tracing uses rcu_irq_enter() as a way to make sure that RCU is watching when
it needs to use rcu_read_lock() and friends. This is because tracing can
happen as RCU is about to enter user space, or about to go idle, and RCU
does not watch for RCU read side critical sections as it makes the
transition.

There is a small location within the RCU infrastructure that rcu_irq_enter()
itself will not work. If tracing were to occur in that section it will break
if it tries to use rcu_irq_enter().

Originally, this happens with the stack_tracer, because it will call
save_stack_trace when it encounters stack usage that is greater than any
stack usage it had encountered previously. There was a case where that
happened in the RCU section where rcu_irq_enter() did not work, and lockdep
complained loudly about it. To fix it, stack tracing added a call to be
disabled and RCU would disable stack tracing during the critical section
that rcu_irq_enter() was inoperable. This solution worked, but there are
other cases that use rcu_irq_enter() and it would be a good idea to let RCU
give a way to let others know that rcu_irq_enter() will not work. For
example, in trace events.

Another helpful aspect of this change is that it also moves the per cpu
variable called in the RCU critical section into a cache locale along with
other RCU per cpu variables used in that same location.

I'm keeping the stack_trace_disable() code, as that still could be used in
the future by places that really need to disable it. And since it's only a
static inline, it wont take up any kernel text if it is not used.

Link: http://lkml.kernel.org/r/20170405093207.404f8deb@gandalf.local.home

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-10 15:22:03 -04:00
Ingo Molnar
f9411ebe3d rcu: Separate the RCU synchronization types and APIs into <linux/rcupdate_wait.h>
So rcupdate.h is a pretty complex header, in particular it includes
<linux/completion.h> which includes <linux/wait.h> - creating a
dependency that includes <linux/wait.h> in <linux/sched.h>,
which prevents the isolation of <linux/sched.h> from the derived
<linux/wait.h> header.

Solve part of the problem by decoupling rcupdate.h from completions:
this can be done by separating out the rcu_synchronize types and APIs,
and updating their usage sites.

Since this is a mostly RCU-internal types this will not just simplify
<linux/sched.h>'s dependencies, but will make all the hundreds of
.c files that include rcupdate.h but not completions or wait.h build
faster.

( For rcutiny this means that two dependent APIs have to be uninlined,
  but that shouldn't be much of a problem as they are rare variants. )

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:24 +01:00
Paul E. McKenney
d85b62f18d srcu: Force full grace-period ordering
If a process invokes synchronize_srcu(), is delayed just the right amount
of time, and thus does not sleep when waiting for the grace period to
complete, there is no ordering between the end of the grace period and
the code following the synchronize_srcu().  Similarly, there can be a
lack of ordering between the end of the SRCU grace period and callback
invocation.

This commit adds the necessary ordering.

Reported-by: Lance Roy <ldr709@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Further smp_mb() adjustment per email with Lance Roy. ]
2017-01-25 12:54:22 -08:00
Paul E. McKenney
52d7e48b86 rcu: Narrow early boot window of illegal synchronous grace periods
The current preemptible RCU implementation goes through three phases
during bootup.  In the first phase, there is only one CPU that is running
with preemption disabled, so that a no-op is a synchronous grace period.
In the second mid-boot phase, the scheduler is running, but RCU has
not yet gotten its kthreads spawned (and, for expedited grace periods,
workqueues are not yet running.  During this time, any attempt to do
a synchronous grace period will hang the system (or complain bitterly,
depending).  In the third and final phase, RCU is fully operational and
everything works normally.

This has been OK for some time, but there has recently been some
synchronous grace periods showing up during the second mid-boot phase.
This code worked "by accident" for awhile, but started failing as soon
as expedited RCU grace periods switched over to workqueues in commit
8b355e3bc1 ("rcu: Drive expedited grace periods from workqueue").
Note that the code was buggy even before this commit, as it was subject
to failure on real-time systems that forced all expedited grace periods
to run as normal grace periods (for example, using the rcu_normal ksysfs
parameter).  The callchain from the failure case is as follows:

early_amd_iommu_init()
|-> acpi_put_table(ivrs_base);
|-> acpi_tb_put_table(table_desc);
|-> acpi_tb_invalidate_table(table_desc);
|-> acpi_tb_release_table(...)
|-> acpi_os_unmap_memory
|-> acpi_os_unmap_iomem
|-> acpi_os_map_cleanup
|-> synchronize_rcu_expedited

The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y,
which caused the code to try using workqueues before they were
initialized, which did not go well.

This commit therefore reworks RCU to permit synchronous grace periods
to proceed during this mid-boot phase.  This commit is therefore a
fix to a regression introduced in v4.9, and is therefore being put
forward post-merge-window in v4.10.

This commit sets a flag from the existing rcu_scheduler_starting()
function which causes all synchronous grace periods to take the expedited
path.  The expedited path now checks this flag, using the requesting task
to drive the expedited grace period forward during the mid-boot phase.
Finally, this flag is updated by a core_initcall() function named
rcu_exp_runtime_mode(), which causes the runtime codepaths to be used.

Note that this arrangement assumes that tasks are not sent POSIX signals
(or anything similar) from the time that the first task is spawned
through core_initcall() time.

Fixes: 8b355e3bc1 ("rcu: Drive expedited grace periods from workqueue")
Reported-by: "Zheng, Lv" <lv.zheng@intel.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Stan Kain <stan.kain@gmail.com>
Tested-by: Ivan <waffolz@hotmail.com>
Tested-by: Emanuel Castelo <emanuel.castelo@gmail.com>
Tested-by: Bruno Pesavento <bpesavento@infinito.it>
Tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Frederic Bezies <fredbezies@gmail.com>
Cc: <stable@vger.kernel.org> # 4.9.0-
2017-01-14 21:23:48 -08:00
Paul E. McKenney
7ec99de36f rcu: Provide exact CPU-online tracking for RCU
Up to now, RCU has assumed that the CPU-online process makes it from
CPU_UP_PREPARE to set_cpu_online() within one jiffy.  Given the recent
rise of virtualized environments, this assumption is very clearly
obsolete.  Failing to meet this deadline can result in RCU paying
attention to an incoming CPU for one jiffy, then ignoring it until the
grace period following the one in which that CPU sets itself online.
This situation might prove to be fatally disappointing to any RCU
read-side critical sections that had the misfortune to execute during
the time in which RCU was ignoring the slow-to-come-online CPU.

This commit therefore updates RCU's internal CPU state-tracking
information at notify_cpu_starting() time, thus providing RCU with
an exact transition of the CPU's state from offline to online.

Note that this means that incoming CPUs must not use RCU read-side
critical section (other than those of SRCU) until notify_cpu_starting()
time.  Note also that the CPU_STARTING notifiers -are- allowed to use
RCU read-side critical sections.  (Of course, CPU-hotplug notifiers are
rapidly becoming obsolete, so you need to act fast!)

If a given architecture or CPU family needs to use RCU read-side
critical sections earlier, the call to rcu_cpu_starting() from
notify_cpu_starting() will need to be architecture-specific, with
architectures that need early use being required to hand-place
the call to rcu_cpu_starting() at some point preceding the call to
notify_cpu_starting().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-08-22 09:36:57 -07:00
Linus Torvalds
468fc7ed55 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Unified UDP encapsulation offload methods for drivers, from
    Alexander Duyck.

 2) Make DSA binding more sane, from Andrew Lunn.

 3) Support QCA9888 chips in ath10k, from Anilkumar Kolli.

 4) Several workqueue usage cleanups, from Bhaktipriya Shridhar.

 5) Add XDP (eXpress Data Path), essentially running BPF programs on RX
    packets as soon as the device sees them, with the option to mirror
    the packet on TX via the same interface.  From Brenden Blanco and
    others.

 6) Allow qdisc/class stats dumps to run lockless, from Eric Dumazet.

 7) Add VLAN support to b53 and bcm_sf2, from Florian Fainelli.

 8) Simplify netlink conntrack entry layout, from Florian Westphal.

 9) Add ipv4 forwarding support to mlxsw spectrum driver, from Ido
    Schimmel, Yotam Gigi, and Jiri Pirko.

10) Add SKB array infrastructure and convert tun and macvtap over to it.
    From Michael S Tsirkin and Jason Wang.

11) Support qdisc packet injection in pktgen, from John Fastabend.

12) Add neighbour monitoring framework to TIPC, from Jon Paul Maloy.

13) Add NV congestion control support to TCP, from Lawrence Brakmo.

14) Add GSO support to SCTP, from Marcelo Ricardo Leitner.

15) Allow GRO and RPS to function on macsec devices, from Paolo Abeni.

16) Support MPLS over IPV4, from Simon Horman.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
  xgene: Fix build warning with ACPI disabled.
  be2net: perform temperature query in adapter regardless of its interface state
  l2tp: Correctly return -EBADF from pppol2tp_getname.
  net/mlx5_core/health: Remove deprecated create_singlethread_workqueue
  net: ipmr/ip6mr: update lastuse on entry change
  macsec: ensure rx_sa is set when validation is disabled
  tipc: dump monitor attributes
  tipc: add a function to get the bearer name
  tipc: get monitor threshold for the cluster
  tipc: make cluster size threshold for monitoring configurable
  tipc: introduce constants for tipc address validation
  net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update()
  MAINTAINERS: xgene: Add driver and documentation path
  Documentation: dtb: xgene: Add MDIO node
  dtb: xgene: Add MDIO node
  drivers: net: xgene: ethtool: Use phy_ethtool_gset and sset
  drivers: net: xgene: Use exported functions
  drivers: net: xgene: Enable MDIO driver
  drivers: net: xgene: Add backward compatibility
  drivers: net: phy: xgene: Add MDIO driver
  ...
2016-07-27 12:03:20 -07:00
Paul E. McKenney
995f140561 rcu: Suppress sparse warnings for rcu_dereference_raw()
Data structures that are used both with and without RCU protection
are difficult to write in a sparse-clean manner.  If you mark the
relevant pointers with __rcu, sparse will complain about all non-RCU
uses, but if you don't mark those pointers, sparse will complain about
all RCU uses.

This commit therefore suppresses sparse warnings for rcu_dereference_raw(),
allowing mixed-protection data structures to avoid these warnings.

Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2016-07-06 10:51:14 +01:00
Paul E. McKenney
4929c913bd rcu: Make call_rcu_tasks() tolerate first call with irqs disabled
Currently, if the very first call to call_rcu_tasks() has irqs disabled,
it will create the rcu_tasks_kthread with irqs disabled, which will
result in a splat in the memory allocator, which kthread_run() invokes
with the expectation that irqs are enabled.

This commit fixes this problem by deferring kthread creation if called
with irqs disabled.  The first call to call_rcu_tasks() that has irqs
enabled will create the kthread.

This bug was detected by rcutorture changes that were motivated by
Iftekhar Ahmed's mutation-testing efforts.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-15 15:45:00 -07:00
Paul E. McKenney
3a37f7275c rcu: No ordering for rcu_assign_pointer() of NULL
This commit does a compile-time check for rcu_assign_pointer() of NULL,
and uses WRITE_ONCE() rather than smp_store_release() in that case.

Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-15 15:31:28 -07:00
Paul E. McKenney
810ce8b5df rcu: Document RCU_NONIDLE() restrictions in comment header
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-14 16:01:42 -07:00
Boqun Feng
293e2421fe rcu: Remove superfluous versions of rcu_read_lock_sched_held()
Currently, we have four versions of rcu_read_lock_sched_held(), depending
on the combined choices on PREEMPT_COUNT and DEBUG_LOCK_ALLOC.  However,
there is an existing function preemptible() that already distinguishes
between the PREEMPT_COUNT=y and PREEMPT_COUNT=n cases, and allows these
four implementations to be consolidated down to two.

This commit therefore uses preemptible() to achieve this consolidation.

Note that there could be a small performance regression in the case
of CONFIG_DEBUG_LOCK_ALLOC=y && PREEMPT_COUNT=n.  However, given the
overhead associated with CONFIG_DEBUG_LOCK_ALLOC=y, this should be
down in the noise.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:50 -07:00
Paul E. McKenney
274529ba9b rcu: Consolidate dumping of ftrace buffer
This commit consolidates a couple definitions and several calls for
single-shot ftrace-buffer dumping.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:29:08 -07:00
Linus Torvalds
710d60cbf1 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull cpu hotplug updates from Thomas Gleixner:
 "This is the first part of the ongoing cpu hotplug rework:

   - Initial implementation of the state machine

   - Runs all online and prepare down callbacks on the plugged cpu and
     not on some random processor

   - Replaces busy loop waiting with completions

   - Adds tracepoints so the states can be followed"

More detailed commentary on this work from an earlier email:
 "What's wrong with the current cpu hotplug infrastructure?

   - Asymmetry

     The hotplug notifier mechanism is asymmetric versus the bringup and
     teardown.  This is mostly caused by the notifier mechanism.

   - Largely undocumented dependencies

     While some notifiers use explicitely defined notifier priorities,
     we have quite some notifiers which use numerical priorities to
     express dependencies without any documentation why.

   - Control processor driven

     Most of the bringup/teardown of a cpu is driven by a control
     processor.  While it is understandable, that preperatory steps,
     like idle thread creation, memory allocation for and initialization
     of essential facilities needs to be done before a cpu can boot,
     there is no reason why everything else must run on a control
     processor.  Before this patch series, bringup looks like this:

       Control CPU                     Booting CPU

       do preparatory steps
       kick cpu into life

                                       do low level init

       sync with booting cpu           sync with control cpu

       bring the rest up

   - All or nothing approach

     There is no way to do partial bringups.  That's something which is
     really desired because we waste e.g.  at boot substantial amount of
     time just busy waiting that the cpu comes to life.  That's stupid
     as we could very well do preparatory steps and the initial IPI for
     other cpus and then go back and do the necessary low level
     synchronization with the freshly booted cpu.

   - Minimal debuggability

     Due to the notifier based design, it's impossible to switch between
     two stages of the bringup/teardown back and forth in order to test
     the correctness.  So in many hotplug notifiers the cancel
     mechanisms are either not existant or completely untested.

   - Notifier [un]registering is tedious

     To [un]register notifiers we need to protect against hotplug at
     every callsite.  There is no mechanism that bringup/teardown
     callbacks are issued on the online cpus, so every caller needs to
     do it itself.  That also includes error rollback.

  What's the new design?

     The base of the new design is a symmetric state machine, where both
     the control processor and the booting/dying cpu execute a well
     defined set of states.  Each state is symmetric in the end, except
     for some well defined exceptions, and the bringup/teardown can be
     stopped and reversed at almost all states.

     So the bringup of a cpu will look like this in the future:

       Control CPU                     Booting CPU

       do preparatory steps
       kick cpu into life

                                       do low level init

       sync with booting cpu           sync with control cpu

                                       bring itself up

     The synchronization step does not require the control cpu to wait.
     That mechanism can be done asynchronously via a worker or some
     other mechanism.

     The teardown can be made very similar, so that the dying cpu cleans
     up and brings itself down.  Cleanups which need to be done after
     the cpu is gone, can be scheduled asynchronously as well.

  There is a long way to this, as we need to refactor the notion when a
  cpu is available.  Today we set the cpu online right after it comes
  out of the low level bringup, which is not really correct.

  The proper mechanism is to set it to available, i.e. cpu local
  threads, like softirqd, hotplug thread etc. can be scheduled on that
  cpu, and once it finished all booting steps, it's set to online, so
  general workloads can be scheduled on it.  The reverse happens on
  teardown.  First thing to do is to forbid scheduling of general
  workloads, then teardown all the per cpu resources and finally shut it
  off completely.

  This patch series implements the basic infrastructure for this at the
  core level.  This includes the following:

   - Basic state machine implementation with well defined states, so
     ordering and prioritization can be expressed.

   - Interfaces to [un]register state callbacks

     This invokes the bringup/teardown callback on all online cpus with
     the proper protection in place and [un]installs the callbacks in
     the state machine array.

     For callbacks which have no particular ordering requirement we have
     a dynamic state space, so that drivers don't have to register an
     explicit hotplug state.

     If a callback fails, the code automatically does a rollback to the
     previous state.

   - Sysfs interface to drive the state machine to a particular step.

     This is only partially functional today.  Full functionality and
     therefor testability will be achieved once we converted all
     existing hotplug notifiers over to the new scheme.

   - Run all CPU_ONLINE/DOWN_PREPARE notifiers on the booting/dying
     processor:

       Control CPU                     Booting CPU

       do preparatory steps
       kick cpu into life

                                       do low level init

       sync with booting cpu           sync with control cpu
       wait for boot
                                       bring itself up

                                       Signal completion to control cpu

     In a previous step of this work we've done a full tree mechanical
     conversion of all hotplug notifiers to the new scheme.  The balance
     is a net removal of about 4000 lines of code.

     This is not included in this series, as we decided to take a
     different approach.  Instead of mechanically converting everything
     over, we will do a proper overhaul of the usage sites one by one so
     they nicely fit into the symmetric callback scheme.

     I decided to do that after I looked at the ugliness of some of the
     converted sites and figured out that their hotplug mechanism is
     completely buggered anyway.  So there is no point to do a
     mechanical conversion first as we need to go through the usage
     sites one by one again in order to achieve a full symmetric and
     testable behaviour"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  cpu/hotplug: Document states better
  cpu/hotplug: Fix smpboot thread ordering
  cpu/hotplug: Remove redundant state check
  cpu/hotplug: Plug death reporting race
  rcu: Make CPU_DYING_IDLE an explicit call
  cpu/hotplug: Make wait for dead cpu completion based
  cpu/hotplug: Let upcoming cpu bring itself fully up
  arch/hotplug: Call into idle with a proper state
  cpu/hotplug: Move online calls to hotplugged cpu
  cpu/hotplug: Create hotplug threads
  cpu/hotplug: Split out the state walk into functions
  cpu/hotplug: Unpark smpboot threads from the state machine
  cpu/hotplug: Move scheduler cpu_online notifier to hotplug core
  cpu/hotplug: Implement setup/removal interface
  cpu/hotplug: Make target state writeable
  cpu/hotplug: Add sysfs state interface
  cpu/hotplug: Hand in target state to _cpu_up/down
  cpu/hotplug: Convert the hotplugged cpu work to a state machine
  cpu/hotplug: Convert to a state machine for the control processor
  cpu/hotplug: Add tracepoints
  ...
2016-03-15 13:50:29 -07:00
Thomas Gleixner
27d50c7eeb rcu: Make CPU_DYING_IDLE an explicit call
Make the RCU CPU_DYING_IDLE callback an explicit function call, so it gets
invoked at the proper place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Rik van Riel <riel@redhat.com>
Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: http://lkml.kernel.org/r/20160226182341.870167933@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-01 20:36:58 +01:00
Yang Shi
3500efae44 rcu: Remove rcu_user_hooks_switch
Because there are neither uses nor intended uses for the
rcu_user_hooks_switch() function that was orginally intended
for nohz use, this commit removes it.

Signed-off-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-02-23 19:59:56 -08:00
Paul E. McKenney
648c630c64 Merge branches 'doc.2015.12.05a', 'exp.2015.12.07a', 'fixes.2015.12.07a', 'list.2015.12.04b' and 'torture.2015.12.05a' into HEAD
doc.2015.12.05a:  Documentation updates
exp.2015.12.07a:  Expedited grace-period updates
fixes.2015.12.07a:  Miscellaneous fixes
list.2015.12.04b:  Linked-list updates
torture.2015.12.05a:  Torture-test updates
2015-12-07 17:02:54 -08:00
Alexey Kardashevskiy
f039f0af08 rcu: Fix comment for rcu_dereference_raw_notrace
rcu_dereference_raw() calls indirectly rcu_read_lock_held() while
rcu_dereference_raw_notrace() does not so fix the comment about the latter.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-07 17:01:32 -08:00
Paul E. McKenney
7c9906ca5e rcu: Don't redundantly disable irqs in rcu_irq_{enter,exit}()
This commit replaces a local_irq_save()/local_irq_restore() pair with
a lockdep assertion that interrupts are already disabled.  This should
remove the corresponding overhead from the interrupt entry/exit fastpaths.

This change was inspired by the fact that Iftekhar Ahmed's mutation
testing showed that removing rcu_irq_enter()'s call to local_ird_restore()
had no effect, which might indicate that interrupts were always enabled
anyway.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-07 17:01:31 -08:00
Paul E. McKenney
79cfea0273 rcu: Remove TINY_RCU bloat from pointless boot parameters
The rcu_expedited, rcu_normal, and rcu_normal_after_boot kernel boot
parameters are pointless in the case of TINY_RCU because in that case
synchronous grace periods, both expedited and normal, are no-ops.
However, these three symbols contribute several hundred bytes of bloat.
This commit therefore uses CPP directives to avoid compiling this code
in TINY_RCU kernels.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-12-07 16:59:37 -08:00
Paul E. McKenney
5a9be7c628 rcu: Add rcu_normal kernel parameter to suppress expediting
Although expedited grace periods can be quite useful, and although their
OS jitter has been greatly reduced, they can still pose problems for
extreme real-time workloads.  This commit therefore adds a rcu_normal
kernel boot parameter (which can also be manipulated via sysfs)
to suppress expedited grace periods, that is, to treat requests for
expedited grace periods as if they were requests for normal grace periods.
If both rcu_expedited and rcu_normal are specified, rcu_normal wins.
This means that if you are relying on expedited grace periods to speed up
boot, you will want to specify rcu_expedited on the kernel command line,
and then specify rcu_normal via sysfs once boot completes.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:53 -08:00
Paul E. McKenney
e62e3f620b rcu: Remove deprecated rcu_lockdep_assert()
The old rcu_lockdep_assert() was retained to ease handling of incoming
patches, but any use will result in deprecated warnings.  However, its
replacement, RCU_LOCKDEP_WARN(), is now upstream.  It is therefore
time to remove rcu_lockdep_assert(), which this commit does.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:16:31 -07:00
Paul E. McKenney
c3ac7cf184 rcu: Add rcu_pointer_handoff()
This commit adds an rcu_pointer_handoff() that is intended to mark
situations where a structure's protection transitions from RCU to some
other mechanism (locking, reference counting, whatever).  These markings
should allow external tools to more easily spot bugs involving leaking
pointers out of RCU read-side critical sections.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-06 11:16:18 -07:00
Boqun Feng
bb73c52bad rcu: Don't disable preemption for Tiny and Tree RCU readers
Because preempt_disable() maps to barrier() for non-debug builds,
it forces the compiler to spill and reload registers.  Because Tree
RCU and Tiny RCU now only appear in CONFIG_PREEMPT=n builds, these
barrier() instances generate needless extra code for each instance of
rcu_read_lock() and rcu_read_unlock().  This extra code slows down Tree
RCU and bloats Tiny RCU.

This commit therefore removes the preempt_disable() and preempt_enable()
from the non-preemptible implementations of __rcu_read_lock() and
__rcu_read_unlock(), respectively.  However, for debug purposes,
preempt_disable() and preempt_enable() are still invoked if
CONFIG_PREEMPT_COUNT=y, because this allows detection of sleeping inside
atomic sections in non-preemptible kernels.

However, Tiny and Tree RCU operates by coalescing all RCU read-side
critical sections on a given CPU that lie between successive quiescent
states.  It is therefore necessary to compensate for removing barriers
from __rcu_read_lock() and __rcu_read_unlock() by adding them to a
couple of the RCU functions invoked during quiescent states, namely to
rcu_all_qs() and rcu_note_context_switch().  However, note that the latter
is more paranoia than necessity, at least until link-time optimizations
become more aggressive.

This is based on an earlier patch by Paul E. McKenney, fixing
a bug encountered in kernels built with CONFIG_PREEMPT=n and
CONFIG_PREEMPT_COUNT=y.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-06 11:08:23 -07:00
Boqun Feng
b6a4ae766e rcu: Use rcu_callback_t in call_rcu*() and friends
As we now have rcu_callback_t typedefs as the type of rcu callbacks, we
should use it in call_rcu*() and friends as the type of parameters. This
could save us a few lines of code and make it clear which function
requires an rcu callbacks rather than other callbacks as its argument.

Besides, this can also help cscope to generate a better database for
code reading.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:08:05 -07:00
Oleg Nesterov
66e8c57da6 rcu: Change _wait_rcu_gp() to work around GCC bug 67055
Code like this in inline functions confuses some recent versions of gcc:

	const int n = const-expr;
	whatever_t array[n];

For more details, see:

	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67055#c13

This compiler bug results in the following failure after 114b7fd4b (rcu:
Create rcu_sync infrastructure):

	In file included from include/linux/rcupdate.h:429:0,
			  from include/linux/rcu_sync.h:5,
			  from kernel/rcu/sync.c:1:
	include/linux/rcutiny.h: In function 'rcu_barrier_sched':
	include/linux/rcutiny.h:55:20: internal compiler error: Segmentation fault
	  static inline void rcu_barrier_sched(void)

This commit therefore eliminates the constant local variable in favor of
direct use of the expression.

Reported-and-tested-by: Mark Salter <msalter@redhat.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 20:50:06 -07:00
Paul E. McKenney
f78f5b90c4 rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()
This commit renames rcu_lockdep_assert() to RCU_LOCKDEP_WARN() for
consistency with the WARN() series of macros.  This also requires
inverting the sense of the conditional, which this commit also does.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2015-07-22 15:27:32 -07:00
Paul E. McKenney
ec90a194ae rcu: Create a synchronize_rcu_mult()
There have been several requests for a primitive that waits for
grace periods for several RCU flavors concurrently, so this
commit creates it.  This is a variadic macro, and you pass in
the call_rcu() functions of the flavors of RCU that you wish to
wait for.

Note that you cannot pass in call_srcu() for two reasons: (1) This
would result in a type mismatch and (2) You need to specify which
srcu_struct you want to use.  Handle this by creating a wrapper
function for your SRCU domain, for example:

	void call_srcu_mine(struct rcu_head *head, rcu_callback_t func)
	{
		call_srcu(&ss_mine, head, func);
	}

You can then do something like this:

	synchronize_rcu_mult(call_srcu_mine, call_rcu, call_rcu_sched);

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-22 15:27:29 -07:00
Peter Zijlstra
155d1d1278 rcu: Use WRITE_ONCE in RCU_INIT_POINTER
For the paranoid amongst us GCC would be in its right to use byte stores
to write our NULL value, tell it not to do that.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-22 15:27:27 -07:00
Denys Vlasenko
d5671f6bf2 rcu: Deinline rcu_read_lock_sched_held() if DEBUG_LOCK_ALLOC
DEBUG_LOCK_ALLOC=y is not a production setting, but it is
not very unusual either. Many developers routinely
use kernels built with it enabled.

Apart from being selected by hand, it is also auto-selected by
PROVE_LOCKING "Lock debugging: prove locking correctness" and
LOCK_STAT "Lock usage statistics" config options.
LOCK STAT is necessary for "perf lock" to work.

I wouldn't spend too much time optimizing it, but this particular
function has a very large cost in code size: when it is deinlined,
code size decreases by 830,000 bytes:

    text     data      bss       dec     hex filename
85674192 22294776 20627456 128596424 7aa39c8 vmlinux.before
84837612 22294424 20627456 127759492 79d7484 vmlinux

(with this config: http://busybox.net/~vda/kernel_config)

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Tejun Heo <tj@kernel.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: linux-kernel@vger.kernel.org
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:43:52 -07:00
Paul E. McKenney
d1ec4c34c7 rcu: Drop RCU_USER_QS in favor of NO_HZ_FULL
The RCU_USER_QS Kconfig parameter is now just a synonym for NO_HZ_FULL,
so this commit eliminates RCU_USER_QS, replacing all uses with NO_HZ_FULL.

Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
2015-07-06 13:52:18 -07:00
Linus Torvalds
02201e3f1b Minor merge needed, due to function move.
Main excitement here is Peter Zijlstra's lockless rbtree optimization to
 speed module address lookup.  He found some abusers of the module lock
 doing that too.
 
 A little bit of parameter work here too; including Dan Streetman's breaking
 up the big param mutex so writing a parameter can load another module (yeah,
 really).  Unfortunately that broke the usual suspects, !CONFIG_MODULES and
 !CONFIG_SYSFS, so those fixes were appended too.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVkgKHAAoJENkgDmzRrbjxQpwQAJVmBN6jF3SnwbQXv9vRixjH
 58V33sb1G1RW+kXxQ3/e8jLX/4VaN479CufruXQp+IJWXsN/CH0lbC3k8m7u50d7
 b1Zeqd/Yrh79rkc11b0X1698uGCSMlzz+V54Z0QOTEEX+nSu2ZZvccFS4UaHkn3z
 rqDo00lb7rxQz8U25qro2OZrG6D3ub2q20TkWUB8EO4AOHkPn8KWP2r429Axrr0K
 wlDWDTTt8/IsvPbuPf3T15RAhq1avkMXWn9nDXDjyWbpLfTn8NFnWmtesgY7Jl4t
 GjbXC5WYekX3w2ZDB9KaT/DAMQ1a7RbMXNSz4RX4VbzDl+yYeSLmIh2G9fZb1PbB
 PsIxrOgy4BquOWsJPm+zeFPSC3q9Cfu219L4AmxSjiZxC3dlosg5rIB892Mjoyv4
 qxmg6oiqtc4Jxv+Gl9lRFVOqyHZrTC5IJ+xgfv1EyP6kKMUKLlDZtxZAuQxpUyxR
 HZLq220RYnYSvkWauikq4M8fqFM8bdt6hLJnv7bVqllseROk9stCvjSiE3A9szH5
 OgtOfYV5GhOeb8pCZqJKlGDw+RoJ21jtNCgOr6DgkNKV9CX/kL/Puwv8gnA0B0eh
 dxCeB7f/gcLl7Cg3Z3gVVcGlgak6JWrLf5ITAJhBZ8Lv+AtL2DKmwEWS/iIMRmek
 tLdh/a9GiCitqS0bT7GE
 =tWPQ
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "Main excitement here is Peter Zijlstra's lockless rbtree optimization
  to speed module address lookup.  He found some abusers of the module
  lock doing that too.

  A little bit of parameter work here too; including Dan Streetman's
  breaking up the big param mutex so writing a parameter can load
  another module (yeah, really).  Unfortunately that broke the usual
  suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
  appended too"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
  modules: only use mod->param_lock if CONFIG_MODULES
  param: fix module param locks when !CONFIG_SYSFS.
  rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
  module: add per-module param_lock
  module: make perm const
  params: suppress unused variable error, warn once just in case code changes.
  modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
  kernel/module.c: avoid ifdefs for sig_enforce declaration
  kernel/workqueue.c: remove ifdefs over wq_power_efficient
  kernel/params.c: export param_ops_bool_enable_only
  kernel/params.c: generalize bool_enable_only
  kernel/module.c: use generic module param operaters for sig_enforce
  kernel/params: constify struct kernel_param_ops uses
  sysfs: tightened sysfs permission checks
  module: Rework module_addr_{min,max}
  module: Use __module_address() for module_address_lookup()
  module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
  module: Optimize __module_address() using a latched RB-tree
  rbtree: Implement generic latch_tree
  seqlock: Introduce raw_read_seqcount_latch()
  ...
2015-07-01 10:49:25 -07:00
Linus Torvalds
43224b96af Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "A rather largish update for everything time and timer related:

   - Cache footprint optimizations for both hrtimers and timer wheel

   - Lower the NOHZ impact on systems which have NOHZ or timer migration
     disabled at runtime.

   - Optimize run time overhead of hrtimer interrupt by making the clock
     offset updates smarter

   - hrtimer cleanups and removal of restrictions to tackle some
     problems in sched/perf

   - Some more leap second tweaks

   - Another round of changes addressing the 2038 problem

   - First step to change the internals of clock event devices by
     introducing the necessary infrastructure

   - Allow constant folding for usecs/msecs_to_jiffies()

   - The usual pile of clockevent/clocksource driver updates

  The hrtimer changes contain updates to sched, perf and x86 as they
  depend on them plus changes all over the tree to cleanup API changes
  and redundant code, which got copied all over the place.  The y2038
  changes touch s390 to remove the last non 2038 safe code related to
  boot/persistant clock"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  clocksource: Increase dependencies of timer-stm32 to limit build wreckage
  timer: Minimize nohz off overhead
  timer: Reduce timer migration overhead if disabled
  timer: Stats: Simplify the flags handling
  timer: Replace timer base by a cpu index
  timer: Use hlist for the timer wheel hash buckets
  timer: Remove FIFO "guarantee"
  timers: Sanitize catchup_timer_jiffies() usage
  hrtimer: Allow hrtimer::function() to free the timer
  seqcount: Introduce raw_write_seqcount_barrier()
  seqcount: Rename write_seqcount_barrier()
  hrtimer: Fix hrtimer_is_queued() hole
  hrtimer: Remove HRTIMER_STATE_MIGRATE
  selftest: Timers: Avoid signal deadlock in leap-a-day
  timekeeping: Copy the shadow-timekeeper over the real timekeeper last
  clockevents: Check state instead of mode in suspend/resume path
  selftests: timers: Add leap-second timer edge testing to leap-a-day.c
  ntp: Do leapsecond adjustment in adjtimex read path
  time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge
  ntp: Introduce and use SECS_PER_DAY macro instead of 86400
  ...
2015-06-22 18:57:44 -07:00
Peter Zijlstra
0a04b01669 rcu: Move lockless_dereference() out of rcupdate.h
I want to use lockless_dereference() from seqlock.h, which would mean
including rcupdate.h from it, however rcupdate.h already includes
seqlock.h.

Avoid this by moving lockless_dereference() into compiler.h. This is
somewhat tricky since it uses smp_read_barrier_depends() which isn't
available there, but its a CPP macro so we can get away with it.

The alternative would be moving it into asm/barrier.h, but that would
be updating each arch (I can do if people feel that is more
appropriate).

Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-05-28 11:32:05 +09:30
Paul E. McKenney
0868aa2216 Merge branches 'array.2015.05.27a', 'doc.2015.05.27a', 'fixes.2015.05.27a', 'hotplug.2015.05.27a', 'init.2015.05.27a', 'tiny.2015.05.27a' and 'torture.2015.05.27a' into HEAD
array.2015.05.27a:  Remove all uses of RCU-protected array indexes.
doc.2015.05.27a:  Docuemntation updates.
fixes.2015.05.27a:  Miscellaneous fixes.
hotplug.2015.05.27a:  CPU-hotplug updates.
init.2015.05.27a:  Initialization/Kconfig updates.
tiny.2015.05.27a:  Updates to Tiny RCU.
torture.2015.05.27a:  Torture-testing updates.
2015-05-27 13:00:49 -07:00
Paul E. McKenney
51952bc633 rcu: Further shrink Tiny RCU by making empty functions static inlines
The Tiny RCU counterparts to rcu_idle_enter(), rcu_idle_exit(),
rcu_irq_enter(), and rcu_irq_exit() are empty functions, but each has
EXPORT_SYMBOL_GPL(), which needlessly consumes extra memory, especially
in kernels built with module support.  This commit therefore moves these
functions to static inlines in rcutiny.h, removing the need for exports.

This won't affect the size of the tiniest kernels, which are likely
built without module support, but might help semi-tiny kernels that
might include module support.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-05-27 12:59:31 -07:00
Paul E. McKenney
3382adbc1b rcu: Eliminate a few CONFIG_RCU_NOCB_CPU_ALL #ifdefs
This commit converts several CONFIG_RCU_NOCB_CPU_ALL #ifdefs to
instead use IS_ENABLED().  This change should help avoid hiding
code from compiler diagnostics.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27 12:58:00 -07:00
Paul E. McKenney
1ebee8017d rcu: Eliminate array-index-based RCU primitives
Now that rcu_access_index() and rcu_dereference_index_check() are no
longer used, the commit removes them from the RCU API.  This means that
RCU's data dependencies now involve only pointers, give or take the
occasional cast to and then back from an integer type to do pointer
arithmetic.  This in turn eliminates the need for a number of operations
on values carrying RCU data dependencies.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: linux-edac@vger.kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
2015-05-27 12:56:16 -07:00
Paul E. McKenney
7d0ae8086b rcu: Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
This commit moves from the old ACCESS_ONCE() API to the new READ_ONCE()
and WRITE_ONCE() APIs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck:  Updated to include kernel/torture.c as suggested by Jason Low. ]
2015-05-27 12:56:15 -07:00
Thomas Gleixner
c1ad348b45 tick: Nohz: Rework next timer evaluation
The evaluation of the next timer in the nohz code is based on jiffies
while all the tick internals are nano seconds based. We have also to
convert hrtimer nanoseconds to jiffies in the !highres case. That's
just wrong and introduces interesting corner cases.

Turn it around and convert the next timer wheel timer expiry and the
rcu event to clock monotonic and base all calculations on
nanoseconds. That identifies the case where no timer is pending
clearly with an absolute expiry value of KTIME_MAX.

Makes the code more readable and gets rid of the jiffies magic in the
nohz code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Link: http://lkml.kernel.org/r/20150414203502.184198593@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:50 +02:00
Paul E. McKenney
42528795ac Merge branches 'doc.2015.02.26a', 'earlycb.2015.03.03a', 'fixes.2015.03.03a', 'gpexp.2015.02.26a', 'hotplug.2015.03.20a', 'sysidle.2015.02.26b' and 'tiny.2015.02.26a' into HEAD
doc.2015.02.26a:  Documentation changes
earlycb.2015.03.03a:  Permit early-boot RCU callbacks
fixes.2015.03.03a:  Miscellaneous fixes
gpexp.2015.02.26a:  In-kernel expediting of normal grace periods
hotplug.2015.03.20a:  CPU hotplug fixes
sysidle.2015.02.26b:  NO_HZ_FULL_SYSIDLE fixes
tiny.2015.02.26a:  TINY_RCU fixes
2015-03-20 08:31:01 -07:00
Paul E. McKenney
88428cc5c2 rcu: Handle outgoing CPUs on exit from idle loop
This commit informs RCU of an outgoing CPU just before that CPU invokes
arch_cpu_idle_dead() during its last pass through the idle loop (via a
new CPU_DYING_IDLE notifier value).  This change means that RCU need not
deal with outgoing CPUs passing through the scheduler after informing
RCU that they are no longer online.  Note that removing the CPU from
the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time,
and orphaning callbacks is still done at CPU_DEAD time, the reason being
that at CPU_DEAD time we have another CPU that can adopt them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-12 15:19:38 -07:00
Paul E. McKenney
b826565aaf rcu: Reverse rcu_dereference_check() conditions
The rcu_dereference_check() family of primitives evaluates the RCU
lockdep expression first, and only then evaluates the expression passed
in.  This works fine normally, but can potentially fail in environments
(such as NMI handlers) where lockdep cannot be invoked.  The problem is
that even if the expression passed in is "1", the compiler would need to
prove that the RCU lockdep expression (rcu_read_lock_held(), for example)
is free of side effects in order to be able to elide it.  Given that
rcu_read_lock_held() is sometimes separately compiled, the compiler cannot
always use this optimization.

This commit therefore reverse the order of evaluation, so that the
expression passed in is evaluated first, and the RCU lockdep expression is
evaluated only if the passed-in expression evaluated to false, courtesy
of the C-language short-circuit boolean evaluation rules.  This compells
the compiler to forego executing the RCU lockdep expression in cases
where the passed-in expression evaluates to "1" at compile time, so that
(for example) rcu_dereference_raw() can be guaranteed to execute safely
within an NMI handler.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2015-03-03 11:16:47 -08:00
Paul E. McKenney
d24209bb68 rcu: Improve diagnostics for blocked critical sections in irq
If an RCU read-side critical section occurs within an interrupt handler
or a softirq handler, it cannot have been preempted.  Therefore, there is
a check in rcu_read_unlock_special() checking for this error.  However,
when this check triggers, it lacks diagnostic information.  This commit
therefore moves rcu_read_unlock()'s lockdep annotation to follow the
call to __rcu_read_unlock() and changes rcu_read_unlock_special()'s
WARN_ON_ONCE() to an lockdep_rcu_suspicious() in order to locate where
the offending RCU read-side critical section began.  In addition, the
value of the ->rcu_read_unlock_special field is printed.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-03 11:16:00 -08:00
Paul E. McKenney
ee42571f43 rcu: Add Kconfig option to expedite grace periods during boot
This commit adds a CONFIG_RCU_EXPEDITE_BOOT Kconfig parameter
that emulates a very early boot rcu_expedite_gp().  A late-boot
call to rcu_end_inkernel_boot() will provide the corresponding
rcu_unexpedite_gp().  The late-boot call to rcu_end_inkernel_boot()
should be made just before init is spawned.

According to Arjan:

> To show the boot time, I'm using the timestamp of the "Write protecting"
> line, that's pretty much the last thing we print prior to ring 3 execution.
>
> A kernel with default RCU behavior (inside KVM, only virtual devices)
> looks like this:
>
> [    0.038724] Write protecting the kernel read-only data: 10240k
>
> a kernel with expedited RCU (using the command line option, so that I
> don't have to recompile between measurements and thus am completely
> oranges-to-oranges)
>
> [    0.031768] Write protecting the kernel read-only data: 10240k
>
> which, in percentage, is an 18% improvement.

Reported-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Arjan van de Ven <arjan@linux.intel.com>
2015-02-26 12:03:03 -08:00
Paul E. McKenney
0d39482c3d rcu: Provide rcu_expedite_gp() and rcu_unexpedite_gp()
Currently, expediting of normal synchronous grace-period primitives
(synchronize_rcu() and friends) is controlled by the rcu_expedited()
boot/sysfs parameter.  This works well, but does not handle nesting.
This commit therefore provides rcu_expedite_gp() to enable expediting
and rcu_unexpedite_gp() to cancel a prior rcu_expedite_gp(), both of
which support nesting.

Reported-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26 12:02:59 -08:00
Paul E. McKenney
ee376dbdf2 rcu: Consolidate rcu_synchronize and wakeme_after_rcu()
There are currently duplicate identical definitions of the
rcu_synchronize() structure and the wakeme_after_rcu() function.
Thie commit therefore consolidates them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-25 17:03:03 -08:00
Paul E. McKenney
78e691f4ae Merge branches 'doc.2015.01.07a', 'fixes.2015.01.15a', 'preempt.2015.01.06a', 'srcu.2015.01.06a', 'stall.2015.01.16a' and 'torture.2015.01.11a' into HEAD
doc.2015.01.07a: Documentation updates.
fixes.2015.01.15a: Miscellaneous fixes.
preempt.2015.01.06a: Changes to handling of lists of preempted tasks.
srcu.2015.01.06a: SRCU updates.
stall.2015.01.16a: RCU CPU stall-warning updates and fixes.
torture.2015.01.11a: RCU torture-test updates and fixes.
2015-01-15 23:34:34 -08:00
Paul E. McKenney
5cd37193ce rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors
Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
in places where it would be useful for it to apply to the normal RCU
flavors, rcu_preempt, rcu_sched, and rcu_bh.  This is especially the
case for workloads that aggressively overload the system, particularly
those that generate large numbers of RCU updates on systems running
NO_HZ_FULL CPUs.  This commit therefore communicates quiescent states
from cond_resched_rcu_qs() to the normal RCU flavors.

Note that it is unfortunately necessary to leave the old ->passed_quiesce
mechanism in place to allow quiescent states that apply to only one
flavor to be recorded.  (Yes, we could decrement ->rcu_qs_ctr_snap in
that case, but that is not so good for debugging of RCU internals.)
In addition, if one of the RCU flavor's grace period has stalled, this
will invoke rcu_momentary_dyntick_idle(), resulting in a heavy-weight
quiescent state visible from other CPUs.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Merge commit from Sasha Levin fixing a bug where __this_cpu()
  was used in preemptible code. ]
2015-01-15 23:33:14 -08:00
Pranith Kumar
ac59853c06 rcupdate: Replace smp_read_barrier_depends() with lockless_dereference()
Recently lockless_dereference() was added which can be used in place of
hard-coding smp_read_barrier_depends(). The following PATCH makes the change.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06 11:01:11 -08:00
Ingo Molnar
d360b78f99 Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

 - Streamline RCU's use of per-CPU variables, shifting from "cpu"
   arguments to functions to "this_"-style per-CPU variable accessors.

 - Signal-handling RCU updates.

 - Real-time updates.

 - Torture-test updates.

 - Miscellaneous fixes.

 - Documentation updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-20 08:57:58 +01:00
Paul E. McKenney
9ea6c58856 Merge branches 'torture.2014.11.03a', 'cpu.2014.11.03a', 'doc.2014.11.13a', 'fixes.2014.11.13a', 'signal.2014.10.29a' and 'rt.2014.10.29a' into HEAD
cpu.2014.11.03a: Changes for per-CPU variables.
doc.2014.11.13a: Documentation updates.
fixes.2014.11.13a: Miscellaneous fixes.
signal.2014.10.29a: Signal changes.
rt.2014.10.29a: Real-time changes.
torture.2014.11.03a: torture-test changes.
2014-11-13 10:39:04 -08:00
Oleg Nesterov
ce36f2f3eb rcu: More info about potential deadlocks with rcu_read_unlock()
The comment above rcu_read_unlock() explains the potential deadlock
if the caller holds one of the locks taken by rt_mutex_unlock() paths,
but it is not clear from this documentation that any lock which can
be taken from interrupt can lead to deadlock as well and we need to
take rt_mutex_lock() into account too.

The problem is that rt_mutex_lock() takes wait_lock without disabling
irqs, and thus an interrupt taking some LOCK can obviously race with
rcu_read_unlock_special() called with the same LOCK held.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-11-13 10:35:40 -08:00
Paul E. McKenney
b6331ae8af rcu: Optimize cond_resched_rcu_qs()
The current implementation of cond_resched_rcu_qs() can invoke
rcu_note_voluntary_context_switch() twice in the should_resched()
case, once via the call to __schedule() and once directly.  However, as
noted by Joe Lawrence in a patch to the team subsystem, cond_resched()
returns an indication as to whether or not the call to __schedule()
actually happened.  This commit therefore changes cond_resched_rcu_qs()
so as to invoke rcu_note_voluntary_context_switch() only when __schedule()
was not called.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-11-13 10:35:39 -08:00
Pranith Kumar
1a6c9b2675 rcu: Add sparse check for RCU_INIT_POINTER()
Add a sparse check when RCU_INIT_POINTER() is used to assign a non __rcu
annotated pointer.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-11-13 10:35:38 -08:00
Paul E. McKenney
aa6da5140b rcu: Remove "cpu" argument to rcu_needs_cpu()
The "cpu" argument to rcu_needs_cpu() is always the current CPU, so drop
it.  This in turn allows the "cpu" argument to rcu_cpu_has_callbacks()
to be removed, which allows the uses of "cpu" in both functions to be
replaced with a this_cpu_ptr().  Again, the anticipated cross-CPU uses
of these functions has been replaced by NO_HZ_FULL.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03 19:20:43 -08:00
Paul E. McKenney
c3377c2da6 rcu: Remove "cpu" argument to rcu_check_callbacks()
The "cpu" argument was kept around on the off-chance that RCU might
offload scheduler-clock interrupts.  However, this offload approach
has been replaced by NO_HZ_FULL, which offloads -all- RCU processing
from qualifying CPUs.  It is therefore time to remove the "cpu" argument
to rcu_check_callbacks(), which this commit does.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03 19:20:11 -08:00
Pranith Kumar
28f6569ab7 rcu: Remove redundant TREE_PREEMPT_RCU config option
PREEMPT_RCU and TREE_PREEMPT_RCU serve the same function after
TINY_PREEMPT_RCU has been removed. This patch removes TREE_PREEMPT_RCU
and uses PREEMPT_RCU config option in its place.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-10-29 10:20:05 -07:00
Paul E. McKenney
54ef6df3f3 rcu: Provide counterpart to rcu_dereference() for non-RCU situations
Although rcu_dereference() and friends can be used in situations where
object lifetimes are being managed by something other than RCU, the
resulting sparse and lockdep-RCU noise can be annoying.  This commit
therefore supplies a lockless_dereference(), which provides the
protection for dereferences without the RCU-related debugging noise.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-28 18:25:17 -04:00
Paul E. McKenney
59da22a020 rcutorture: Rename rcutorture_runnable parameter
This commit changes rcutorture_runnable to torture_runnable, which is
consistent with the names of the other parameters and is a bit shorter
as well.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-16 13:41:44 -07:00
Paul E. McKenney
96b4672703 Merge branch 'rcu-tasks.2014.09.10a' into HEAD
rcu-tasks.2014.09.10a: Add RCU-tasks flavor of RCU.
2014-09-16 10:10:44 -07:00
Paul E. McKenney
e98d06dd6c Merge branches 'doc.2014.09.07a', 'fixes.2014.09.10a', 'nocb-nohz.2014.09.16b' and 'torture.2014.09.07a' into HEAD
doc.2014.09.07a: Documentation updates.
fixes.2014.09.10a: Miscellaneous fixes.
nocb-nohz.2014.09.16b: No-CBs CPUs and NO_HZ_FULL updates.
torture.2014.09.07a: Torture-test updates.
2014-09-16 10:08:34 -07:00
Paul E. McKenney
f4579fc57c rcu: Fix attempt to avoid unsolicited offloading of callbacks
Commit b58cc46c5f (rcu: Don't offload callbacks unless specifically
requested) failed to adjust the callback lists of the CPUs that are
known to be no-CBs CPUs only because they are also nohz_full= CPUs.
This failure can result in callbacks that are posted during early boot
getting stranded on nxtlist for CPUs whose no-CBs property becomes
apparent late, and there can also be spurious warnings about offline
CPUs posting callbacks.

This commit fixes these problems by adding an early-boot rcu_init_nohz()
that properly initializes the no-CBs CPUs.

Note that kernels built with CONFIG_RCU_NOCB_CPU_ALL=y or with
CONFIG_RCU_NOCB_CPU=n do not exhibit this bug.  Neither do kernels
booted without the nohz_full= boot parameter.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-09-16 10:07:59 -07:00
Paul E. McKenney
284a8c93af rcu: Per-CPU operation cleanups to rcu_*_qs() functions
The rcu_bh_qs(), rcu_preempt_qs(), and rcu_sched_qs() functions use
old-style per-CPU variable access and write to ->passed_quiesce even
if it is already set.  This commit therefore updates to use the new-style
per-CPU variable access functions and avoids the spurious writes.
This commit also eliminates the "cpu" argument to these functions because
they are always invoked on the indicated CPU.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-07 16:27:35 -07:00
Paul E. McKenney
01a8133034 rcu: Remove redundant preempt_disable() from rcu_note_voluntary_context_switch()
In theory, synchronize_sched() requires a read-side critical section
to order against.  In practice, preemption can be thought of as
being disabled across every machine instruction, at least for those
machine instructions that are not in the idle loop and not on offline
CPUs.  So this commit removes the redundant preempt_disable() from
rcu_note_voluntary_context_switch().

Please note that the single instruction in question is the store of
zero to ->rcu_tasks_holdout.  The "if" is simply a performance optimization
that avoids unnecessary stores.  To see this, keep in mind that both
the "if" condition and the store are in a quiescent state.  Therefore,
even if the task is preempted for a full grace period (presumably due
to its having done a context switch beforehand), the store will be
recording a legitimate quiescent state.

Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Conflicts:
	include/linux/rcupdate.h
2014-09-07 16:27:32 -07:00
Paul E. McKenney
69c604557c rcutorture: Add torture tests for RCU-tasks
This commit adds torture tests for RCU-tasks.  It also fixes a bug that
would segfault for an RCU flavor lacking a callback-barrier function.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-09-07 16:27:24 -07:00
Paul E. McKenney
3f95aa81d2 rcu: Make TASKS_RCU handle tasks that are almost done exiting
Once a task has passed exit_notify() in the do_exit() code path, it
is no longer on the task lists, and is therefore no longer visible
to rcu_tasks_kthread().  This means that an almost-exited task might
be preempted while within a trampoline, and this task won't be waited
on by rcu_tasks_kthread().  This commit fixes this bug by adding an
srcu_struct.  An exiting task does srcu_read_lock() just before calling
exit_notify(), and does the corresponding srcu_read_unlock() after
doing the final preempt_disable().  This means that rcu_tasks_kthread()
can do synchronize_srcu() to wait for all mostly-exited tasks to reach
their final preempt_disable() region, and then use synchronize_sched()
to wait for those tasks to finish exiting.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-07 16:27:22 -07:00
Paul E. McKenney
53c6d4edf8 rcu: Add synchronous grace-period waiting for RCU-tasks
It turns out to be easier to add the synchronous grace-period waiting
functions to RCU-tasks than to work around their absense in rcutorture,
so this commit adds them.  The key point is that the existence of
call_rcu_tasks() means that rcutorture needs an rcu_barrier_tasks().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-07 16:27:21 -07:00
Paul E. McKenney
bde6c3aa99 rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops
RCU-tasks requires the occasional voluntary context switch
from CPU-bound in-kernel tasks.  In some cases, this requires
instrumenting cond_resched().  However, there is some reluctance
to countenance unconditionally instrumenting cond_resched() (see
http://lwn.net/Articles/603252/), so this commit creates a separate
cond_resched_rcu_qs() that may be used in place of cond_resched() in
locations prone to long-duration in-kernel looping.

This commit currently instruments only RCU-tasks.  Future possibilities
include also instrumenting RCU, RCU-bh, and RCU-sched in order to reduce
IPI usage.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-07 16:27:20 -07:00
Paul E. McKenney
8315f42295 rcu: Add call_rcu_tasks()
This commit adds a new RCU-tasks flavor of RCU, which provides
call_rcu_tasks().  This RCU flavor's quiescent states are voluntary
context switch (not preemption!) and userspace execution (not the idle
loop -- use some sort of schedule_on_each_cpu() if you need to handle the
idle tasks.  Note that unlike other RCU flavors, these quiescent states
occur in tasks, not necessarily CPUs.  Includes fixes from Steven Rostedt.

This RCU flavor is assumed to have very infrequent latency-tolerant
updaters.  This assumption permits significant simplifications, including
a single global callback list protected by a single global lock, along
with a single task-private linked list containing all tasks that have not
yet passed through a quiescent state.  If experience shows this assumption
to be incorrect, the required additional complexity will be added.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-07 16:27:19 -07:00
Oleg Nesterov
85b39d305b rcu: Uninline rcu_read_lock_held()
This commit uninlines rcu_read_lock_held(). According to "size vmlinux"
this saves 28549 in .text:

	- 5541731 3014560 14757888 23314179
	+ 5513182 3026848 14757888 23297918

Note: it looks as if the data grows by 12288 bytes but this is not true,
it does not actually grow. But .data starts with ALIGN(THREAD_SIZE) and
since .text shrinks the padding grows, and thus .data grows too as it
seen by /bin/size. diff System.map:

	- ffffffff81510000 D _sdata
	- ffffffff81510000 D init_thread_union
	+ ffffffff81509000 D _sdata
	+ ffffffff8150c000 D init_thread_union

Perhaps we can change vmlinux.lds.S to .data itself, so that /bin/size
can't "wrongly" report that .data grows if .text shinks.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-07 16:18:13 -07:00
Pranith Kumar
521d24ee59 rcu: Return bool type in rcu_lockdep_current_cpu_online()
Return true instead of 1 in rcu_lockdep_current_cpu_online() as this
has bool as return type.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-07 16:18:11 -07:00
Paul E. McKenney
ab74fdfd4e rcu: Handle obsolete references to TINY_PREEMPT_RCU
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2014-07-09 09:14:17 -07:00
Paul E. McKenney
f27bc4873f rcu: Document deadlock-avoidance information for rcu_read_unlock()
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2014-07-09 09:14:07 -07:00
Paul E. McKenney
4a81e8328d rcu: Reduce overhead of cond_resched() checks for RCU
Commit ac1bea8578 (Make cond_resched() report RCU quiescent states)
fixed a problem where a CPU looping in the kernel with but one runnable
task would give RCU CPU stall warnings, even if the in-kernel loop
contained cond_resched() calls.  Unfortunately, in so doing, it introduced
performance regressions in Anton Blanchard's will-it-scale "open1" test.
The problem appears to be not so much the increased cond_resched() path
length as an increase in the rate at which grace periods complete, which
increased per-update grace-period overhead.

This commit takes a different approach to fixing this bug, mainly by
moving the RCU-visible quiescent state from cond_resched() to
rcu_note_context_switch(), and by further reducing the check to a
simple non-zero test of a single per-CPU variable.  However, this
approach requires that the force-quiescent-state processing send
resched IPIs to the offending CPUs.  These will be sent only once
the grace period has reached an age specified by the boot/sysfs
parameter rcutree.jiffies_till_sched_qs, or once the grace period
reaches an age halfway to the point at which RCU CPU stall warnings
will be emitted, whichever comes first.

Reported-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Lameter <cl@gentwo.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
[ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the
  ktest build robot.  Also fixed smp_mb() comment as noted by
  Oleg Nesterov. ]

Merge with e552592e (Reduce overhead of cond_resched() checks for RCU)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-06-23 11:19:32 -07:00
Paul E. McKenney
546a9d8519 rcu: Export debug_init_rcu_head() and and debug_init_rcu_head()
Currently, call_rcu() relies on implicit allocation and initialization
for the debug-objects handling of RCU callbacks.  If you hammer the
kernel hard enough with Sasha's modified version of trinity, you can end
up with the sl*b allocators recursing into themselves via this implicit
call_rcu() allocation.

This commit therefore exports the debug_init_rcu_head() and
debug_rcu_head_free() functions, which permits the allocators to allocated
and pre-initialize the debug-objects information, so that there no longer
any need for call_rcu() to do that initialization, which in turn prevents
the recursion into the memory allocators.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Looks-good-to: Christoph Lameter <cl@linux.com>
2014-06-23 11:19:29 -07:00
Rik van Riel
61f38db3e3 rcu: Provide API to suppress stall warnings while sysrc runs
Some sysrq handlers can run for a long time, because they dump a lot
of data onto a serial console. Having RCU stall warnings pop up in
the middle of them only makes the problem worse.

This commit provides rcu_sysrq_start() and rcu_sysrq_end() APIs to
temporarily suppress RCU CPU stall warnings while a sysrq request is
handled.

Signed-off-by: Rik van Riel <riel@redhat.com>
[ paulmck: Fix TINY_RCU build error. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-05-19 10:52:04 -07:00
Paul E. McKenney
f5d2a0450d Merge branches 'doc.2014.04.29a', 'fixes.2014.04.29a' and 'torture.2014.05.14a' into HEAD
doc.2014.04.29a:  Documentation updates.
fixes.2014.04.29a:  Miscellaneous fixes.
torture.2014.05.14a:  RCU/Lock torture tests.
2014-05-14 10:57:31 -07:00
Paul E. McKenney
ac1bea8578 sched,rcu: Make cond_resched() report RCU quiescent states
Given a CPU running a loop containing cond_resched(), with no
other tasks runnable on that CPU, RCU will eventually report RCU
CPU stall warnings due to lack of quiescent states.  Fortunately,
every call to cond_resched() is a perfectly good quiescent state.
Unfortunately, invoking rcu_note_context_switch() is a bit heavyweight
for cond_resched(), especially given the need to disable preemption,
and, for RCU-preempt, interrupts as well.

This commit therefore maintains a per-CPU counter that causes
cond_resched(), cond_resched_lock(), and cond_resched_softirq() to call
rcu_note_context_switch(), but only about once per 256 invocations.
This ratio was chosen in keeping with the relative time constants of
RCU grace periods.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-05-14 09:46:11 -07:00
Paul E. McKenney
ad0dc7f94d rcutorture: Add forward-progress checking for writer
The rcutorture output currently does not distinguish between stalls in
the RCU implementation and stalls in the rcu_torture_writer() kthreads.
This commit therefore adds some diagnostics to help distinguish between
these two conditions, at least for the non-SRCU implementations.  (SRCU
does not provide evidence of update-side forward progress by design.)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-05-13 11:18:18 -07:00
Paul E. McKenney
71a9b26963 rcu: Document RCU_INIT_POINTER()'s lack of ordering guarantees
Although rcu_assign_pointer() provides ordering guarantees,
RCU_INIT_POINTER() does not.  This commit makes that explicit
in the docbook comment header.

Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-04-29 08:45:13 -07:00
Iulia Manda
a235c09165 rcu: Remove "extern" from function declaration in include/linux/rcupdate.h
Because functions have the extern storage class specifier by default,
this keyword can be removed. It is redundant to use it explicitly.

Signed-off-by: Iulia Manda <iulia.manda21@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-04-29 08:44:33 -07:00
Linus Torvalds
b3fd4ea9df Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "Main changes:

   - Torture-test changes, including refactoring of rcutorture and
     introduction of a vestigial locktorture.

   - Real-time latency fixes.

   - Documentation updates.

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
  rcu: Provide grace-period piggybacking API
  rcu: Ensure kernel/rcu/rcu.h can be sourced/used stand-alone
  rcu: Fix sparse warning for rcu_expedited from kernel/ksysfs.c
  notifier: Substitute rcu_access_pointer() for rcu_dereference_raw()
  Documentation/memory-barriers.txt: Clarify release/acquire ordering
  rcutorture: Save kvm.sh output to log
  rcutorture: Add a lock_busted to test the test
  rcutorture: Place kvm-test-1-run.sh output into res directory
  rcutorture: Rename TREE_RCU-Kconfig.txt
  locktorture: Add kvm-recheck.sh plug-in for locktorture
  rcutorture: Gracefully handle NULL cleanup hooks
  locktorture: Add vestigial locktorture configuration
  rcutorture: Introduce "rcu" directory level underneath configs
  rcutorture: Rename kvm-test-1-rcu.sh
  rcutorture: Remove RCU dependencies from ver_functions.sh API
  rcutorture: Create CFcommon file for common Kconfig parameters
  rcutorture: Create config files for scripted test-the-test testing
  rcutorture: Add an rcu_busted to test the test
  locktorture: Add a lock-torture kernel module
  rcutorture: Abstract kvm-recheck.sh
  ...
2014-03-31 11:05:24 -07:00
Paul E. McKenney
322efba5b6 Merge branches 'doc.2014.02.24a', 'fixes.2014.02.26a' and 'rt.2014.02.17b' into HEAD
doc.2014.02.24a: Documentation changes
fixes.2014.02.26a: Miscellaneous fixes
rt.2014.02.17b: Response-time-related changes
2014-02-26 06:36:09 -08:00
Paul Gortmaker
7a75474318 rcu: Fix sparse warning for rcu_expedited from kernel/ksysfs.c
This commit fixes the follwoing warning:

kernel/ksysfs.c:143:5: warning: symbol 'rcu_expedited' was not declared. Should it be static?

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
[ paulmck: Moved the declaration to include/linux/rcupdate.h to avoid
	   including the RCU-internal rcu.h file outside of RCU. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-26 06:35:16 -08:00
Paul E. McKenney
ffa83fb565 rcu: Optimize rcu_needs_cpu() for RCU_NOCB_CPU_ALL
If CONFIG_RCU_NOCB_CPU_ALL=y, then rcu_needs_cpu() will always
return false, however, the current version nevertheless checks
for RCU callbacks.  This commit therefore creates a static inline
implementation of rcu_needs_cpu() that unconditionally returns false
when CONFIG_RCU_NOCB_CPU_ALL=y.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-17 16:03:09 -08:00
Paul E. McKenney
2f33b512a5 rcu: Optimize rcu_is_nocb_cpu() for RCU_NOCB_CPU_ALL
If CONFIG_RCU_NOCB_CPU_ALL=y, then rcu_is_nocb_cpu() will always
return true, however, the current version nevertheless checks
rcu_nocb_mask.  This commit therefore creates a static inline
implementation of rcu_is_nocb_cpu() that unconditionally returns
true when CONFIG_RCU_NOCB_CPU_ALL=y.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-17 15:32:48 -08:00
Paul E. McKenney
88c1863066 rcu: Define rcu_assign_pointer() in terms of smp_store_release()
The new smp_store_release() function provides better guarantees than did
rcu_assign_pointer(), and potentially less overhead on some architectures.
The guarantee that smp_store_release() provides that rcu_assign_pointer()
does that is obscure, but its lack could cause considerable confusion.
This guarantee is illustrated by the following code fragment:

	struct foo {
		int a;
		int b;
		int c;
		struct foo *next;
	};
	struct foo foo1;
	struct foo foo2;
	struct foo __rcu *foop;

	...

	foo2.a = 1;
	foo2.b = 2;
	BUG_ON(foo2.c);
	rcu_assign_pointer(foop, &foo);

	...

	fp = rcu_dereference(foop);
	fp.c = 3;

The current rcu_assign_pointer() semantics permit the BUG_ON() to
trigger because rcu_assign_pointer()'s smp_wmb() is not guaranteed to
order prior reads against later writes.  This commit therefore upgrades
rcu_assign_pointer() from smp_wmb() to smp_store_release() to avoid this
counter-intuitive outcome.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-17 15:02:00 -08:00
Joe Perches
0adab9b9aa rcu: Indentation and spacing fixes.
This commit outdents expression-statement macros, thus repairing a few
line-length complaints.  Also fix some spacing errors called out by
checkpatch.pl.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-17 15:01:52 -08:00
Joe Perches
41f4abd92a rcu: Glue ASCII strings together
Split strings make it difficult to find the code that resulted in a
given console message, so this commit glues split strings back together
despite the resulting long lines.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-17 15:01:47 -08:00