mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2025-01-16 01:46:56 +07:00
0a7cbf9abe
Fortunately Jason was able to reduce some of the overhead we had introduced in the original rwsem optimistic spinning - an it is now the same size as mutexes. Update the documentation accordingly. Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: aswin@hp.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Link: http://lkml.kernel.org/r/1406752916-3341-7-git-send-email-davidlohr@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
158 lines
6.2 KiB
Plaintext
158 lines
6.2 KiB
Plaintext
Generic Mutex Subsystem
|
|
|
|
started by Ingo Molnar <mingo@redhat.com>
|
|
updated by Davidlohr Bueso <davidlohr@hp.com>
|
|
|
|
What are mutexes?
|
|
-----------------
|
|
|
|
In the Linux kernel, mutexes refer to a particular locking primitive
|
|
that enforces serialization on shared memory systems, and not only to
|
|
the generic term referring to 'mutual exclusion' found in academia
|
|
or similar theoretical text books. Mutexes are sleeping locks which
|
|
behave similarly to binary semaphores, and were introduced in 2006[1]
|
|
as an alternative to these. This new data structure provided a number
|
|
of advantages, including simpler interfaces, and at that time smaller
|
|
code (see Disadvantages).
|
|
|
|
[1] http://lwn.net/Articles/164802/
|
|
|
|
Implementation
|
|
--------------
|
|
|
|
Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h
|
|
and implemented in kernel/locking/mutex.c. These locks use a three
|
|
state atomic counter (->count) to represent the different possible
|
|
transitions that can occur during the lifetime of a lock:
|
|
|
|
1: unlocked
|
|
0: locked, no waiters
|
|
negative: locked, with potential waiters
|
|
|
|
In its most basic form it also includes a wait-queue and a spinlock
|
|
that serializes access to it. CONFIG_SMP systems can also include
|
|
a pointer to the lock task owner (->owner) as well as a spinner MCS
|
|
lock (->osq), both described below in (ii).
|
|
|
|
When acquiring a mutex, there are three possible paths that can be
|
|
taken, depending on the state of the lock:
|
|
|
|
(i) fastpath: tries to atomically acquire the lock by decrementing the
|
|
counter. If it was already taken by another task it goes to the next
|
|
possible path. This logic is architecture specific. On x86-64, the
|
|
locking fastpath is 2 instructions:
|
|
|
|
0000000000000e10 <mutex_lock>:
|
|
e21: f0 ff 0b lock decl (%rbx)
|
|
e24: 79 08 jns e2e <mutex_lock+0x1e>
|
|
|
|
the unlocking fastpath is equally tight:
|
|
|
|
0000000000000bc0 <mutex_unlock>:
|
|
bc8: f0 ff 07 lock incl (%rdi)
|
|
bcb: 7f 0a jg bd7 <mutex_unlock+0x17>
|
|
|
|
|
|
(ii) midpath: aka optimistic spinning, tries to spin for acquisition
|
|
while the lock owner is running and there are no other tasks ready
|
|
to run that have higher priority (need_resched). The rationale is
|
|
that if the lock owner is running, it is likely to release the lock
|
|
soon. The mutex spinners are queued up using MCS lock so that only
|
|
one spinner can compete for the mutex.
|
|
|
|
The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spinlock
|
|
with the desirable properties of being fair and with each cpu trying
|
|
to acquire the lock spinning on a local variable. It avoids expensive
|
|
cacheline bouncing that common test-and-set spinlock implementations
|
|
incur. An MCS-like lock is specially tailored for optimistic spinning
|
|
for sleeping lock implementation. An important feature of the customized
|
|
MCS lock is that it has the extra property that spinners are able to exit
|
|
the MCS spinlock queue when they need to reschedule. This further helps
|
|
avoid situations where MCS spinners that need to reschedule would continue
|
|
waiting to spin on mutex owner, only to go directly to slowpath upon
|
|
obtaining the MCS lock.
|
|
|
|
|
|
(iii) slowpath: last resort, if the lock is still unable to be acquired,
|
|
the task is added to the wait-queue and sleeps until woken up by the
|
|
unlock path. Under normal circumstances it blocks as TASK_UNINTERRUPTIBLE.
|
|
|
|
While formally kernel mutexes are sleepable locks, it is path (ii) that
|
|
makes them more practically a hybrid type. By simply not interrupting a
|
|
task and busy-waiting for a few cycles instead of immediately sleeping,
|
|
the performance of this lock has been seen to significantly improve a
|
|
number of workloads. Note that this technique is also used for rw-semaphores.
|
|
|
|
Semantics
|
|
---------
|
|
|
|
The mutex subsystem checks and enforces the following rules:
|
|
|
|
- Only one task can hold the mutex at a time.
|
|
- Only the owner can unlock the mutex.
|
|
- Multiple unlocks are not permitted.
|
|
- Recursive locking/unlocking is not permitted.
|
|
- A mutex must only be initialized via the API (see below).
|
|
- A task may not exit with a mutex held.
|
|
- Memory areas where held locks reside must not be freed.
|
|
- Held mutexes must not be reinitialized.
|
|
- Mutexes may not be used in hardware or software interrupt
|
|
contexts such as tasklets and timers.
|
|
|
|
These semantics are fully enforced when CONFIG DEBUG_MUTEXES is enabled.
|
|
In addition, the mutex debugging code also implements a number of other
|
|
features that make lock debugging easier and faster:
|
|
|
|
- Uses symbolic names of mutexes, whenever they are printed
|
|
in debug output.
|
|
- Point-of-acquire tracking, symbolic lookup of function names,
|
|
list of all locks held in the system, printout of them.
|
|
- Owner tracking.
|
|
- Detects self-recursing locks and prints out all relevant info.
|
|
- Detects multi-task circular deadlocks and prints out all affected
|
|
locks and tasks (and only those tasks).
|
|
|
|
|
|
Interfaces
|
|
----------
|
|
Statically define the mutex:
|
|
DEFINE_MUTEX(name);
|
|
|
|
Dynamically initialize the mutex:
|
|
mutex_init(mutex);
|
|
|
|
Acquire the mutex, uninterruptible:
|
|
void mutex_lock(struct mutex *lock);
|
|
void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
|
|
int mutex_trylock(struct mutex *lock);
|
|
|
|
Acquire the mutex, interruptible:
|
|
int mutex_lock_interruptible_nested(struct mutex *lock,
|
|
unsigned int subclass);
|
|
int mutex_lock_interruptible(struct mutex *lock);
|
|
|
|
Acquire the mutex, interruptible, if dec to 0:
|
|
int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
|
|
|
|
Unlock the mutex:
|
|
void mutex_unlock(struct mutex *lock);
|
|
|
|
Test if the mutex is taken:
|
|
int mutex_is_locked(struct mutex *lock);
|
|
|
|
Disadvantages
|
|
-------------
|
|
|
|
Unlike its original design and purpose, 'struct mutex' is larger than
|
|
most locks in the kernel. E.g: on x86-64 it is 40 bytes, almost twice
|
|
as large as 'struct semaphore' (24 bytes) and tied, along with rwsems,
|
|
for the largest lock in the kernel. Larger structure sizes mean more
|
|
CPU cache and memory footprint.
|
|
|
|
When to use mutexes
|
|
-------------------
|
|
|
|
Unless the strict semantics of mutexes are unsuitable and/or the critical
|
|
region prevents the lock from being shared, always prefer them to any other
|
|
locking primitive.
|