linux_dsm_epyc7002/Documentation/RCU/lockdep-splat.txt

Lockdep-RCU was added to the Linux kernel in early 2010
(http://lwn.net/Articles/371986/).  This facility checks for some common
misuses of the RCU API, most notably using one of the rcu_dereference()
family to access an RCU-protected pointer without the proper protection.
When such misuse is detected, an lockdep-RCU splat is emitted.

The usual cause of a lockdep-RCU slat is someone accessing an
RCU-protected data structure without either (1) being in the right kind of
RCU read-side critical section or (2) holding the right update-side lock.
This problem can therefore be serious: it might result in random memory
overwriting or worse.  There can of course be false positives, this
being the real world and all that.

So let's look at an example RCU lockdep splat from 3.0-rc5, one that
has long since been fixed:

=============================
WARNING: suspicious RCU usage
-----------------------------
block/cfq-iosched.c:2776 suspicious rcu_dereference_protected() usage!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 0
3 locks held by scsi_scan_6/1552:
 #0:  (&shost->scan_mutex){+.+.}, at: [<ffffffff8145efca>]
scsi_scan_host_selected+0x5a/0x150
 #1:  (&eq->sysfs_lock){+.+.}, at: [<ffffffff812a5032>]
elevator_exit+0x22/0x60
 #2:  (&(&q->__queue_lock)->rlock){-.-.}, at: [<ffffffff812b6233>]
cfq_exit_queue+0x43/0x190

stack backtrace:
Pid: 1552, comm: scsi_scan_6 Not tainted 3.0.0-rc5 #17
Call Trace:
 [<ffffffff810abb9b>] lockdep_rcu_dereference+0xbb/0xc0
 [<ffffffff812b6139>] __cfq_exit_single_io_context+0xe9/0x120
 [<ffffffff812b626c>] cfq_exit_queue+0x7c/0x190
 [<ffffffff812a5046>] elevator_exit+0x36/0x60
 [<ffffffff812a802a>] blk_cleanup_queue+0x4a/0x60
 [<ffffffff8145cc09>] scsi_free_queue+0x9/0x10
 [<ffffffff81460944>] __scsi_remove_device+0x84/0xd0
 [<ffffffff8145dca3>] scsi_probe_and_add_lun+0x353/0xb10
 [<ffffffff817da069>] ? error_exit+0x29/0xb0
 [<ffffffff817d98ed>] ? _raw_spin_unlock_irqrestore+0x3d/0x80
 [<ffffffff8145e722>] __scsi_scan_target+0x112/0x680
 [<ffffffff812c690d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff817da069>] ? error_exit+0x29/0xb0
 [<ffffffff812bcc60>] ? kobject_del+0x40/0x40
 [<ffffffff8145ed16>] scsi_scan_channel+0x86/0xb0
 [<ffffffff8145f0b0>] scsi_scan_host_selected+0x140/0x150
 [<ffffffff8145f149>] do_scsi_scan_host+0x89/0x90
 [<ffffffff8145f170>] do_scan_async+0x20/0x160
 [<ffffffff8145f150>] ? do_scsi_scan_host+0x90/0x90
 [<ffffffff810975b6>] kthread+0xa6/0xb0
 [<ffffffff817db154>] kernel_thread_helper+0x4/0x10
 [<ffffffff81066430>] ? finish_task_switch+0x80/0x110
 [<ffffffff817d9c04>] ? retint_restore_args+0xe/0xe
 [<ffffffff81097510>] ? __kthread_init_worker+0x70/0x70
 [<ffffffff817db150>] ? gs_change+0xb/0xb

Line 2776 of block/cfq-iosched.c in v3.0-rc5 is as follows:

	if (rcu_dereference(ioc->ioc_data) == cic) {

This form says that it must be in a plain vanilla RCU read-side critical
section, but the "other info" list above shows that this is not the
case.  Instead, we hold three locks, one of which might be RCU related.
And maybe that lock really does protect this reference.  If so, the fix
is to inform RCU, perhaps by changing __cfq_exit_single_io_context() to
take the struct request_queue "q" from cfq_exit_queue() as an argument,
which would permit us to invoke rcu_dereference_protected as follows:

	if (rcu_dereference_protected(ioc->ioc_data,
				      lockdep_is_held(&q->queue_lock)) == cic) {

With this change, there would be no lockdep-RCU splat emitted if this
code was invoked either from within an RCU read-side critical section
or with the ->queue_lock held.  In particular, this would have suppressed
the above lockdep-RCU splat because ->queue_lock is held (see #2 in the
list above).

On the other hand, perhaps we really do need an RCU read-side critical
section.  In this case, the critical section must span the use of the
return value from rcu_dereference(), or at least until there is some
reference count incremented or some such.  One way to handle this is to
add rcu_read_lock() and rcu_read_unlock() as follows:

	rcu_read_lock();
	if (rcu_dereference(ioc->ioc_data) == cic) {
		spin_lock(&ioc->lock);
		rcu_assign_pointer(ioc->ioc_data, NULL);
		spin_unlock(&ioc->lock);
	}
	rcu_read_unlock();

With this change, the rcu_dereference() is always within an RCU
read-side critical section, which again would have suppressed the
above lockdep-RCU splat.

But in this particular case, we don't actually dereference the pointer
returned from rcu_dereference().  Instead, that pointer is just compared
to the cic pointer, which means that the rcu_dereference() can be replaced
by rcu_access_pointer() as follows:

	if (rcu_access_pointer(ioc->ioc_data) == cic) {

Because it is legal to invoke rcu_access_pointer() without protection,
this change would also suppress the above lockdep-RCU splat.
rcu: Document interpretation of RCU-lockdep splats There has been quite a bit of confusion about what RCU-lockdep splats mean, so this commit adds some documentation describing how to interpret them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> 2011-07-22 07:10:40 +07:00			`Lockdep-RCU was added to the Linux kernel in early 2010`
			`(http://lwn.net/Articles/371986/). This facility checks for some common`
			`misuses of the RCU API, most notably using one of the rcu_dereference()`
			`family to access an RCU-protected pointer without the proper protection.`
			`When such misuse is detected, an lockdep-RCU splat is emitted.`

			`The usual cause of a lockdep-RCU slat is someone accessing an`
			`RCU-protected data structure without either (1) being in the right kind of`
			`RCU read-side critical section or (2) holding the right update-side lock.`
			`This problem can therefore be serious: it might result in random memory`
			`overwriting or worse. There can of course be false positives, this`
			`being the real world and all that.`

			`So let's look at an example RCU lockdep splat from 3.0-rc5, one that`
			`has long since been fixed:`

doc: rcu: Suspicious RCU usage is a warning Suspicious RCU usage messages are reported as warnings. Fixes: a5dd63efda3d07b5 ("lockdep: Use "WARNING" tag on lockdep splats") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 2019-02-28 17:59:32 +07:00			`=============================`
			`WARNING: suspicious RCU usage`
			`-----------------------------`
rcu: Document interpretation of RCU-lockdep splats There has been quite a bit of confusion about what RCU-lockdep splats mean, so this commit adds some documentation describing how to interpret them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> 2011-07-22 07:10:40 +07:00			`block/cfq-iosched.c:2776 suspicious rcu_dereference_protected() usage!`

			`other info that might help us debug this:`


			`rcu_scheduler_active = 1, debug_locks = 0`
			`3 locks held by scsi_scan_6/1552:`
Documentation/locking/lockdep: Drop last two chars of sample states Since the removal of FS_RECLAIM annotations, lockdep states contain four characters, not six. Fixes: e5684bbfc3f03480 ("Documentation/locking/lockdep: Update info about states") Fixes: d92a8cfcb37ecd13 ("locking/lockdep: Rework FS_RECLAIM annotation") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 2019-03-01 16:40:52 +07:00			`#0: (&shost->scan_mutex){+.+.}, at: [<ffffffff8145efca>]`
rcu: Document interpretation of RCU-lockdep splats There has been quite a bit of confusion about what RCU-lockdep splats mean, so this commit adds some documentation describing how to interpret them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> 2011-07-22 07:10:40 +07:00			`scsi_scan_host_selected+0x5a/0x150`
Documentation/locking/lockdep: Drop last two chars of sample states Since the removal of FS_RECLAIM annotations, lockdep states contain four characters, not six. Fixes: e5684bbfc3f03480 ("Documentation/locking/lockdep: Update info about states") Fixes: d92a8cfcb37ecd13 ("locking/lockdep: Rework FS_RECLAIM annotation") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 2019-03-01 16:40:52 +07:00			`#1: (&eq->sysfs_lock){+.+.}, at: [<ffffffff812a5032>]`
rcu: Document interpretation of RCU-lockdep splats There has been quite a bit of confusion about what RCU-lockdep splats mean, so this commit adds some documentation describing how to interpret them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> 2011-07-22 07:10:40 +07:00			`elevator_exit+0x22/0x60`
Documentation/locking/lockdep: Drop last two chars of sample states Since the removal of FS_RECLAIM annotations, lockdep states contain four characters, not six. Fixes: e5684bbfc3f03480 ("Documentation/locking/lockdep: Update info about states") Fixes: d92a8cfcb37ecd13 ("locking/lockdep: Rework FS_RECLAIM annotation") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 2019-03-01 16:40:52 +07:00			`#2: (&(&q->__queue_lock)->rlock){-.-.}, at: [<ffffffff812b6233>]`
rcu: Document interpretation of RCU-lockdep splats There has been quite a bit of confusion about what RCU-lockdep splats mean, so this commit adds some documentation describing how to interpret them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> 2011-07-22 07:10:40 +07:00			`cfq_exit_queue+0x43/0x190`

			`stack backtrace:`
			`Pid: 1552, comm: scsi_scan_6 Not tainted 3.0.0-rc5 #17`
			`Call Trace:`
			`[<ffffffff810abb9b>] lockdep_rcu_dereference+0xbb/0xc0`
			`[<ffffffff812b6139>] __cfq_exit_single_io_context+0xe9/0x120`
			`[<ffffffff812b626c>] cfq_exit_queue+0x7c/0x190`
			`[<ffffffff812a5046>] elevator_exit+0x36/0x60`
			`[<ffffffff812a802a>] blk_cleanup_queue+0x4a/0x60`
			`[<ffffffff8145cc09>] scsi_free_queue+0x9/0x10`
			`[<ffffffff81460944>] __scsi_remove_device+0x84/0xd0`
			`[<ffffffff8145dca3>] scsi_probe_and_add_lun+0x353/0xb10`
			`[<ffffffff817da069>] ? error_exit+0x29/0xb0`
			`[<ffffffff817d98ed>] ? _raw_spin_unlock_irqrestore+0x3d/0x80`
			`[<ffffffff8145e722>] __scsi_scan_target+0x112/0x680`
			`[<ffffffff812c690d>] ? trace_hardirqs_off_thunk+0x3a/0x3c`
			`[<ffffffff817da069>] ? error_exit+0x29/0xb0`
			`[<ffffffff812bcc60>] ? kobject_del+0x40/0x40`
			`[<ffffffff8145ed16>] scsi_scan_channel+0x86/0xb0`
			`[<ffffffff8145f0b0>] scsi_scan_host_selected+0x140/0x150`
			`[<ffffffff8145f149>] do_scsi_scan_host+0x89/0x90`
			`[<ffffffff8145f170>] do_scan_async+0x20/0x160`
			`[<ffffffff8145f150>] ? do_scsi_scan_host+0x90/0x90`
			`[<ffffffff810975b6>] kthread+0xa6/0xb0`
			`[<ffffffff817db154>] kernel_thread_helper+0x4/0x10`
			`[<ffffffff81066430>] ? finish_task_switch+0x80/0x110`
			`[<ffffffff817d9c04>] ? retint_restore_args+0xe/0xe`
kthread: kthread worker API cleanup A good practice is to prefix the names of functions by the name of the subsystem. The kthread worker API is a mix of classic kthreads and workqueues. Each worker has a dedicated kthread. It runs a generic function that process queued works. It is implemented as part of the kthread subsystem. This patch renames the existing kthread worker API to use the corresponding name from the workqueues API prefixed by kthread_: __init_kthread_worker() -> __kthread_init_worker() init_kthread_worker() -> kthread_init_worker() init_kthread_work() -> kthread_init_work() insert_kthread_work() -> kthread_insert_work() queue_kthread_work() -> kthread_queue_work() flush_kthread_work() -> kthread_flush_work() flush_kthread_worker() -> kthread_flush_worker() Note that the names of DEFINE_KTHREAD_WORK*() macros stay as they are. It is common that the "DEFINE_" prefix has precedence over the subsystem names. Note that INIT() macros and init() functions use different naming scheme. There is no good solution. There are several reasons for this solution: + "init" in the function names stands for the verb "initialize" aka "initialize worker". While "INIT" in the macro names stands for the noun "INITIALIZER" aka "worker initializer". + INIT() macros are used only in DEFINE() macros + init() functions are used close to the other kthread() functions. It looks much better if all the functions use the same scheme. + There will be also kthread_destroy_worker() that will be used close to kthread_cancel_work(). It is related to the init() function. Again it looks better if all functions use the same naming scheme. + there are several precedents for such init() function names, e.g. amd_iommu_init_device(), free_area_init_node(), jump_label_init_type(), regmap_init_mmio_clk(), + It is not an argument but it was inconsistent even before. [arnd@arndb.de: fix linux-next merge conflict] Link: http://lkml.kernel.org/r/20160908135724.1311726-1-arnd@arndb.de Link: http://lkml.kernel.org/r/1470754545-17632-3-git-send-email-pmladek@suse.com Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 2016-10-12 03:55:20 +07:00			`[<ffffffff81097510>] ? __kthread_init_worker+0x70/0x70`
rcu: Document interpretation of RCU-lockdep splats There has been quite a bit of confusion about what RCU-lockdep splats mean, so this commit adds some documentation describing how to interpret them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> 2011-07-22 07:10:40 +07:00			`[<ffffffff817db150>] ? gs_change+0xb/0xb`

			`Line 2776 of block/cfq-iosched.c in v3.0-rc5 is as follows:`

			`if (rcu_dereference(ioc->ioc_data) == cic) {`

			`This form says that it must be in a plain vanilla RCU read-side critical`
			`section, but the "other info" list above shows that this is not the`
			`case. Instead, we hold three locks, one of which might be RCU related.`
			`And maybe that lock really does protect this reference. If so, the fix`
			`is to inform RCU, perhaps by changing __cfq_exit_single_io_context() to`
			`take the struct request_queue "q" from cfq_exit_queue() as an argument,`
			`which would permit us to invoke rcu_dereference_protected as follows:`

			`if (rcu_dereference_protected(ioc->ioc_data,`
			`lockdep_is_held(&q->queue_lock)) == cic) {`

			`With this change, there would be no lockdep-RCU splat emitted if this`
			`code was invoked either from within an RCU read-side critical section`
			`or with the ->queue_lock held. In particular, this would have suppressed`
			`the above lockdep-RCU splat because ->queue_lock is held (see #2 in the`
			`list above).`

			`On the other hand, perhaps we really do need an RCU read-side critical`
			`section. In this case, the critical section must span the use of the`
			`return value from rcu_dereference(), or at least until there is some`
			`reference count incremented or some such. One way to handle this is to`
			`add rcu_read_lock() and rcu_read_unlock() as follows:`

			`rcu_read_lock();`
			`if (rcu_dereference(ioc->ioc_data) == cic) {`
			`spin_lock(&ioc->lock);`
			`rcu_assign_pointer(ioc->ioc_data, NULL);`
			`spin_unlock(&ioc->lock);`
			`}`
			`rcu_read_unlock();`

			`With this change, the rcu_dereference() is always within an RCU`
			`read-side critical section, which again would have suppressed the`
			`above lockdep-RCU splat.`

doc: Fix typo "deference" to "dereference" Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> 2019-11-14 00:12:59 +07:00			`But in this particular case, we don't actually dereference the pointer`
rcu: Document interpretation of RCU-lockdep splats There has been quite a bit of confusion about what RCU-lockdep splats mean, so this commit adds some documentation describing how to interpret them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> 2011-07-22 07:10:40 +07:00			`returned from rcu_dereference(). Instead, that pointer is just compared`
			`to the cic pointer, which means that the rcu_dereference() can be replaced`
			`by rcu_access_pointer() as follows:`

			`if (rcu_access_pointer(ioc->ioc_data) == cic) {`

			`Because it is legal to invoke rcu_access_pointer() without protection,`
			`this change would also suppress the above lockdep-RCU splat.`