linux_dsm_epyc7002/kernel/rcu
Paul E. McKenney ba3a86e472 rcu-tasks: Fix grace-period/unlock race in RCU Tasks Trace
The more intense grace-period processing resulting from the 50x RCU
Tasks Trace grace-period speedups exposed the following race condition:

o	Task A running on CPU 0 executes rcu_read_lock_trace(),
	entering a read-side critical section.

o	When Task A eventually invokes rcu_read_unlock_trace()
	to exit its read-side critical section, this function
	notes that the ->trc_reader_special.s flag is zero and
	and therefore invoke wil set ->trc_reader_nesting to zero
	using WRITE_ONCE().  But before that happens...

o	The RCU Tasks Trace grace-period kthread running on some other
	CPU interrogates Task A, but this fails because this task is
	currently running.  This kthread therefore sends an IPI to CPU 0.

o	CPU 0 receives the IPI, and thus invokes trc_read_check_handler().
	Because Task A has not yet cleared its ->trc_reader_nesting
	counter, this function sees that Task A is still within its
	read-side critical section.  This function therefore sets the
	->trc_reader_nesting.b.need_qs flag, AKA the .need_qs flag.

	Except that Task A has already checked the .need_qs flag, which
	is part of the ->trc_reader_special.s flag.  The .need_qs flag
	therefore remains set until Task A's next rcu_read_unlock_trace().

o	Task A now invokes synchronize_rcu_tasks_trace(), which cannot
	start a new grace period until the current grace period completes.
	And thus cannot return until after that time.

	But Task A's .need_qs flag is still set, which prevents the current
	grace period from completing.  And because Task A is blocked, it
	will never execute rcu_read_unlock_trace() until its call to
	synchronize_rcu_tasks_trace() returns.

	We are therefore deadlocked.

This race is improbable, but 80 hours of rcutorture made it happen twice.
The race was possible before the grace-period speedup, but roughly 50x
less probable.  Several thousand hours of rcutorture would have been
necessary to have a reasonable chance of making this happen before this
50x speedup.

This commit therefore eliminates this deadlock by setting
->trc_reader_nesting to a large negative number before checking the
.need_qs and zeroing (or decrementing with respect to its initial
value) ->trc_reader_nesting.  For its part, the IPI handler's
trc_read_check_handler() function adds a check for negative values,
deferring evaluation of the task in this case.  Taken together, these
changes avoid this deadlock scenario.

Fixes: 276c410448 ("rcu-tasks: Split ->trc_reader_need_end")
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: <bpf@vger.kernel.org>
Cc: <stable@vger.kernel.org> # 5.7.x
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-09-16 16:32:37 -07:00
..
Kconfig rcu-tasks: Add Kconfig option to mediate smp_mb() vs. IPI 2020-04-27 11:03:52 -07:00
Kconfig.debug refperf: Rename RCU_REF_PERF_TEST to RCU_REF_SCALE_TEST 2020-06-29 12:00:46 -07:00
Makefile refperf: Rename refperf.c to refscale.c and change internal names 2020-06-29 12:00:46 -07:00
rcu_segcblist.c rcu: Remove dead code from rcu_segcblist_insert_pend_cbs() 2020-02-20 15:58:23 -08:00
rcu_segcblist.h rcu: Remove kfree_rcu() special casing and lazy-callback handling 2020-01-24 10:24:31 -08:00
rcu.h Merge branches 'fixes.2020.04.27a', 'kfree_rcu.2020.04.27a', 'rcu-tasks.2020.04.27a', 'stall.2020.04.27a' and 'torture.2020.05.07a' into HEAD 2020-05-07 10:18:32 -07:00
rcuperf.c This tree adds the sched_set_fifo*() encapsulation APIs to remove 2020-08-06 11:55:43 -07:00
rcutorture.c This tree adds the sched_set_fifo*() encapsulation APIs to remove 2020-08-06 11:55:43 -07:00
refscale.c refperf: Rename refperf.c to refscale.c and change internal names 2020-06-29 12:00:46 -07:00
srcutiny.c rcu: Use CONFIG_PREEMPTION where appropriate 2019-12-09 12:37:51 -08:00
srcutree.c srcu: Avoid local_irq_save() before acquiring spinlock_t 2020-06-29 12:01:22 -07:00
sync.c rcu/sync: Simplify the state machine 2019-05-28 09:05:23 -07:00
tasks.h rcu-tasks: Fix grace-period/unlock race in RCU Tasks Trace 2020-09-16 16:32:37 -07:00
tiny.c rcu: Rename *_kfree_callback/*_kfree_rcu_offset/kfree_call_* 2020-06-29 11:59:25 -07:00
tree_exp.h rcu: Expedited grace-period sleeps to idle priority 2020-06-29 11:58:50 -07:00
tree_plugin.h rcu: No-CBs-related sleeps to idle priority 2020-06-29 11:58:50 -07:00
tree_stall.h Merge branches 'doc.2020.06.29a', 'fixes.2020.06.29a', 'kfree_rcu.2020.06.29a', 'rcu-tasks.2020.06.29a', 'scale.2020.06.29a', 'srcu.2020.06.29a' and 'torture.2020.06.29a' into HEAD 2020-06-29 12:03:15 -07:00
tree.c rcu: kasan: record and print call_rcu() call stack 2020-08-07 11:33:28 -07:00
tree.h rcu: grpnum just records group number 2020-06-29 11:58:51 -07:00
update.c Merge branches 'doc.2020.06.29a', 'fixes.2020.06.29a', 'kfree_rcu.2020.06.29a', 'rcu-tasks.2020.06.29a', 'scale.2020.06.29a', 'srcu.2020.06.29a' and 'torture.2020.06.29a' into HEAD 2020-06-29 12:03:15 -07:00