linux_dsm_epyc7002/kernel/locking
Davidlohr Bueso 036cc30c6b locking/osq: No need for load/acquire when acquire-polling
Both mutexes and rwsems took a performance hit when we switched
over from the original mcs code to the cancelable variant (osq).
The reason being the use of smp_load_acquire() when polling for
node->locked. This is not needed as reordering is not an issue,
as such, relax the barrier semantics. Paul describes the scenario
nicely: https://lkml.org/lkml/2013/11/19/405

  - If we start polling before the insertion is complete, all that
    happens is that the first few polls have no chance of seeing a lock
    grant.

  - Ordering the polling against the initialization -- the above
    xchg() is already doing that for us.

The smp_load_acquire() when unqueuing make sense. In addition,
we don't need to worry about leaking the critical region as
osq is only used internally.

This impacts both regular and large levels of concurrency,
ie on a 40 core system with a disk intensive workload:

	disk-1               804.83 (  0.00%)      828.16 (  2.90%)
	disk-61             8063.45 (  0.00%)    18181.82 (125.48%)
	disk-121            7187.41 (  0.00%)    20119.17 (179.92%)
	disk-181            6933.32 (  0.00%)    20509.91 (195.82%)
	disk-241            6850.81 (  0.00%)    20397.80 (197.74%)
	disk-301            6815.22 (  0.00%)    20287.58 (197.68%)
	disk-361            7080.40 (  0.00%)    20205.22 (185.37%)
	disk-421            7076.13 (  0.00%)    19957.33 (182.04%)
	disk-481            7083.25 (  0.00%)    19784.06 (179.31%)
	disk-541            7038.39 (  0.00%)    19610.92 (178.63%)
	disk-601            7072.04 (  0.00%)    19464.53 (175.23%)
	disk-661            7010.97 (  0.00%)    19348.23 (175.97%)
	disk-721            7069.44 (  0.00%)    19255.33 (172.37%)
	disk-781            7007.58 (  0.00%)    19103.14 (172.61%)
	disk-841            6981.18 (  0.00%)    18964.22 (171.65%)
	disk-901            6968.47 (  0.00%)    18826.72 (170.17%)
	disk-961            6964.61 (  0.00%)    18708.02 (168.62%)

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1420573509-24774-7-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-14 15:16:20 +01:00
..
lglock.c locking: Move the lglocks code to kernel/locking/ 2013-11-06 09:24:20 +01:00
lockdep_internals.h lockdep: Increase static allocations 2014-04-18 14:20:50 +02:00
lockdep_proc.c lockdep/proc: Fix lock-time avg computation 2013-11-11 12:41:34 +01:00
lockdep_states.h locking: Move the lockdep code to kernel/locking/ 2013-11-06 07:55:08 +01:00
lockdep.c locking/lockdep: Revert qrwlock recusive stuff 2014-10-03 06:09:30 +02:00
locktorture.c locktorture: Cleanup header usage 2014-09-30 00:10:02 -07:00
Makefile locking/mcs: Better differentiate between MCS variants 2015-01-14 15:07:32 +01:00
mcs_spinlock.h locking/mcs: Better differentiate between MCS variants 2015-01-14 15:07:32 +01:00
mutex-debug.c mutex: Always clear owner field upon mutex_unlock() 2015-01-09 11:20:39 +01:00
mutex-debug.h locking: Move the mutex code to kernel/locking/ 2013-11-06 07:55:07 +01:00
mutex.c locking/mutex: Introduce ww_mutex_set_context_slowpath() 2015-01-14 15:07:30 +01:00
mutex.h locking/mutexes: Use MUTEX_SPIN_ON_OWNER when appropriate 2014-08-13 10:32:02 +02:00
osq_lock.c locking/osq: No need for load/acquire when acquire-polling 2015-01-14 15:16:20 +01:00
percpu-rwsem.c locking: Move the percpu-rwsem code to kernel/locking/ 2013-11-06 09:24:22 +01:00
qrwlock.c arch, locking: Ciao arch_mutex_cpu_relax() 2014-07-17 12:32:47 +02:00
rtmutex_common.h rtmutex: Cleanup deadlock detector debug logic 2014-06-21 22:05:30 +02:00
rtmutex-debug.c rtmutex: Cleanup deadlock detector debug logic 2014-06-21 22:05:30 +02:00
rtmutex-debug.h rtmutex: Cleanup deadlock detector debug logic 2014-06-21 22:05:30 +02:00
rtmutex-tester.c locking: Move the rtmutex code to kernel/locking/ 2013-11-06 09:23:59 +01:00
rtmutex.c locking/Documentation: Move locking related docs into Documentation/locking/ 2014-08-13 10:32:03 +02:00
rtmutex.h rtmutex: Cleanup deadlock detector debug logic 2014-06-21 22:05:30 +02:00
rwsem-spinlock.c locking/rwsem: Rename 'activity' to 'count' 2014-07-16 14:56:55 +02:00
rwsem-xadd.c locking/rwsem: Avoid double checking before try acquiring write lock 2014-10-03 06:09:29 +02:00
rwsem.c locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNER 2014-07-16 14:57:13 +02:00
semaphore.c locking/semaphore: Resolve some shadow warnings 2014-09-04 07:17:24 +02:00
spinlock_debug.c locking: Move the spinlock code to kernel/locking/ 2013-11-06 07:55:21 +01:00
spinlock.c locking: Move the spinlock code to kernel/locking/ 2013-11-06 07:55:21 +01:00