linux_dsm_epyc7002/kernel/locking
Thomas Gleixner 1a1fb985f2 futex: Handle early deadlock return correctly
commit 56222b212e ("futex: Drop hb->lock before enqueueing on the
rtmutex") changed the locking rules in the futex code so that the hash
bucket lock is not longer held while the waiter is enqueued into the
rtmutex wait list. This made the lock and the unlock path symmetric, but
unfortunately the possible early exit from __rt_mutex_proxy_start() due to
a detected deadlock was not updated accordingly. That allows a concurrent
unlocker to observe inconsitent state which triggers the warning in the
unlock path.

futex_lock_pi()                         futex_unlock_pi()
  lock(hb->lock)
  queue(hb_waiter)				lock(hb->lock)
  lock(rtmutex->wait_lock)
  unlock(hb->lock)
                                        // acquired hb->lock
                                        hb_waiter = futex_top_waiter()
                                        lock(rtmutex->wait_lock)
  __rt_mutex_proxy_start()
     ---> fail
          remove(rtmutex_waiter);
     ---> returns -EDEADLOCK
  unlock(rtmutex->wait_lock)
                                        // acquired wait_lock
                                        wake_futex_pi()
                                        rt_mutex_next_owner()
					  --> returns NULL
                                          --> WARN

  lock(hb->lock)
  unqueue(hb_waiter)

The problem is caused by the remove(rtmutex_waiter) in the failure case of
__rt_mutex_proxy_start() as this lets the unlocker observe a waiter in the
hash bucket but no waiter on the rtmutex, i.e. inconsistent state.

The original commit handles this correctly for the other early return cases
(timeout, signal) by delaying the removal of the rtmutex waiter until the
returning task reacquired the hash bucket lock.

Treat the failure case of __rt_mutex_proxy_start() in the same way and let
the existing cleanup code handle the eventual handover of the rtmutex
gracefully. The regular rt_mutex_proxy_start() gains the rtmutex waiter
removal for the failure case, so that the other callsites are still
operating correctly.

Add proper comments to the code so all these details are fully documented.

Thanks to Peter for helping with the analysis and writing the really
valuable code comments.

Fixes: 56222b212e ("futex: Drop hb->lock before enqueueing on the rtmutex")
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: Stefan Liebler <stli@linux.ibm.com>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1901292311410.1950@nanos.tec.linutronix.de
2019-02-08 13:00:36 +01:00
..
lockdep_internals.h locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y 2018-10-09 09:56:33 +02:00
lockdep_proc.c locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y 2018-10-09 09:56:33 +02:00
lockdep_states.h
lockdep.c Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-12-26 14:25:52 -08:00
locktorture.c drm pull for 4.19-rc1 2018-08-15 17:39:07 -07:00
Makefile
mcs_spinlock.h
mutex-debug.c locking/mutex: Replace spin_is_locked() with lockdep 2018-11-12 09:06:22 -08:00
mutex-debug.h
mutex.c kernel/locking/mutex.c: remove caller signal_pending branch predictions 2019-01-04 13:13:48 -08:00
mutex.h
osq_lock.c
percpu-rwsem.c
qrwlock.c
qspinlock_paravirt.h mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
qspinlock_stat.h locking/qspinlock_stat: Count instances of nested lock slowpaths 2018-10-17 08:37:31 +02:00
qspinlock.c locking/pvqspinlock: Extend node size when pvqspinlock is configured 2018-10-17 08:37:32 +02:00
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex.c futex: Handle early deadlock return correctly 2019-02-08 13:00:36 +01:00
rtmutex.h
rwsem-spinlock.c
rwsem-xadd.c locking/rwsem: Fix (possible) missed wakeup 2019-01-21 11:15:39 +01:00
rwsem.c locking/rwsem: Make owner store task pointer of last owning reader 2018-09-10 12:04:07 +02:00
rwsem.h locking/rwsem: Make owner store task pointer of last owning reader 2018-09-10 12:04:07 +02:00
semaphore.c
spinlock_debug.c
spinlock.c
test-ww_mutex.c locking/ww_mutex: Fix runtime warning in the WW mutex selftest 2018-10-03 08:56:31 +02:00