linux_dsm_epyc7002/lib
Peter Zijlstra f0f1d32f93 llist: Remove cpu_relax() usage in cmpxchg loops
Initial benchmarks show they're a net loss:

 $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i; done
 $ echo 4096 32000 64 128 > /proc/sys/kernel/sem
 $ ./sembench -t 2048 -w 1900 -o 0

Pre:

 run time 30 seconds 778936 worker burns per second
 run time 30 seconds 912190 worker burns per second
 run time 30 seconds 817506 worker burns per second
 run time 30 seconds 830870 worker burns per second
 run time 30 seconds 845056 worker burns per second

Post:

 run time 30 seconds 905920 worker burns per second
 run time 30 seconds 849046 worker burns per second
 run time 30 seconds 886286 worker burns per second
 run time 30 seconds 822320 worker burns per second
 run time 30 seconds 900283 worker burns per second

So about 4% faster. (!)

cpu_relax() stalls the pipeline, therefore, when used in a tight loop
it has the following benefits:

 - allows SMT siblings to have a go;
 - reduces pressure on the CPU interconnect.

However, cmpxchg loops are unfair and thus have unbounded completion
time, therefore we should avoid getting in such heavily contended
situations where the above benefits make any difference.

A typical cmpxchg loop should not go round more than a handfull of
times at worst, therefore adding extra delays just slows things down.

Since the llist primitives are new, there aren't any bad users yet,
and we should avoid growing them. Heavily contended sites should
generally be better off using the ticket locks for serialization since
they provide bounded completion times (fifo-fair over the cpus).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1315836358.26517.43.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-04 12:44:03 +02:00
..
lzo lib: add support for LZO-compressed kernels 2010-01-11 09:34:04 -08:00
raid6 Move .gitignore from drivers/md to lib/raid6 2010-08-30 17:35:52 +10:00
reed_solomon lib: Remove unnecessary inclusions of asm/semaphore.h 2008-04-18 22:17:17 -04:00
xz XZ: Fix incorrect XZ_BUF_ERROR 2011-09-21 13:39:59 -07:00
zlib_deflate zlib: slim down zlib_deflate() workspace when possible 2011-03-22 17:44:17 -07:00
zlib_inflate inflate_fast: sout is already a short so ptr arith was off by one. 2010-03-12 15:52:44 -08:00
.gitignore
argv_split.c tree-wide: convert open calls to remove spaces to skip_spaces() lib function 2009-12-15 08:53:32 -08:00
atomic64_test.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
atomic64.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
audit.c audit: support the "standard" <asm-generic/unistd.h> 2011-05-04 14:41:28 -04:00
average.c lib: Improve EWMA efficiency by using bitshifts 2010-12-06 15:58:43 -05:00
bcd.c rtc: BCD codeshrink 2008-07-24 10:47:33 -07:00
bch.c lib: add shared BCH ECC library 2011-03-11 14:25:50 +00:00
bitmap.c Merge branch 'apei' into apei-release 2011-08-03 11:30:42 -04:00
bitrev.c lib: export bitrev16 2008-06-06 11:29:10 -07:00
bsearch.c lib: Add generic binary search function to the kernel. 2011-05-19 16:55:27 +09:30
btree.c Fix common misspellings 2011-03-31 11:26:23 -03:00
bug.c modules: Fix module_bug_list list corruption race 2010-10-05 11:29:27 -07:00
bust_spinlocks.c oops handling: ensure that any oops is flushed to the mtdoops console 2009-01-06 15:59:11 -08:00
check_signature.c uninline check_signature() 2007-07-16 09:05:50 -07:00
checksum.c lib/checksum.c: optimize do_csum a bit 2011-07-07 04:52:24 -07:00
cmdline.c generic, memparse(): constify argument 2008-07-28 15:05:23 +02:00
cordic.c lib: cordic: add library module providing cordic angle calculation 2011-06-03 15:01:07 -04:00
cpu_rmap.c lib: cpu_rmap: CPU affinity reverse-mapping 2011-01-24 14:51:56 -08:00
cpu-notifier-error-inject.c fault-injection: add CPU notifier error injection module 2010-05-27 09:12:48 -07:00
cpumask.c cpumask: alloc_cpumask_var() use NUMA_NO_NODE 2011-07-26 16:49:44 -07:00
crc7.c CRC7 support 2007-07-17 10:23:04 -07:00
crc8.c lib: crc8: add new library module providing crc8 algorithm 2011-06-03 15:01:06 -04:00
crc16.c
crc32.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
crc32defs.h
crc-ccitt.c
crc-itu-t.c
crc-t10dif.c [SCSI] lib: Add support for the T10 (SCSI) Data Integrity Field CRC 2008-07-12 08:22:32 -05:00
ctype.c ctype: constify read-only _ctype string 2009-12-15 08:53:32 -08:00
debug_locks.c Revert "debug_locks: set oops_in_progress if we will log messages." 2010-11-29 15:18:28 -08:00
debugobjects.c debugobjects: Fix boot crash when kmemleak and debugobjects enabled 2011-06-20 14:38:43 +02:00
dec_and_lock.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
decompress_bunzip2.c Decompressors: include <linux/slab.h> in <linux/decompress/mm.h> 2011-01-13 08:03:23 -08:00
decompress_inflate.c decompressors: check input size in decompress_inflate.c 2011-01-13 08:03:25 -08:00
decompress_unlzma.c Decompressors: validate match distance in decompress_unlzma.c 2011-01-13 08:03:24 -08:00
decompress_unlzo.c Decompressors: fix callback-to-callback mode in decompress_unlzo.c 2011-01-13 08:03:24 -08:00
decompress_unxz.c Fix common misspellings 2011-03-31 11:26:23 -03:00
decompress.c decompressors: add boot-time XZ support 2011-01-13 08:03:25 -08:00
devres.c devres: fix possible use after free 2011-07-25 20:57:14 -07:00
div64.c div64_u64(): improve precision on 32bit platforms 2010-10-26 16:52:19 -07:00
dma-debug.c dma-debug: print information about leaked entry 2011-04-07 16:31:19 +02:00
dump_stack.c
dynamic_debug.c dynamic_debug: add #include <linux/sched.h> 2011-02-03 15:59:58 -08:00
extable.c module: trim exception table on init free. 2009-06-12 21:47:04 +09:30
fault-inject.c fault-injection: add ability to export fault_attr in arbitrary directory 2011-08-03 14:25:20 -10:00
find_last_bit.c bitops: add #ifndef for each of find bitops 2011-05-26 17:12:38 -07:00
find_next_bit.c arch: remove CONFIG_GENERIC_FIND_{NEXT_BIT,BIT_LE,LAST_BIT} 2011-05-26 17:12:38 -07:00
flex_array.c flex_array: avoid divisions when accessing elements 2011-05-26 17:12:33 -07:00
gcd.c lib: add lib/gcd.c 2009-06-18 13:04:05 -07:00
gen_crc32table.c crc32: major optimization 2010-05-25 08:07:06 -07:00
genalloc.c lib, Make gen_pool memory allocator lockless 2011-08-03 11:15:57 -04:00
halfmd4.c
hexdump.c include/linux/printk.h lib/hexdump.c: neatening and add CONFIG_PRINTK guard 2011-01-13 08:03:10 -08:00
hweight.c x86: Add optimized popcnt variants 2010-04-06 15:52:11 -07:00
idr.c ida: simplified functions for id allocation 2011-08-03 14:25:20 -10:00
inflate.c MN10300: Don't try and #include <linux/slab.h> in lib/inflate.c from bootloader 2010-08-12 09:51:35 -07:00
int_sqrt.c
iomap_copy.c
iomap.c iomap: make IOPORT/PCI mapping functions conditional 2011-07-22 18:46:26 +02:00
iommu-helper.c iommu: inline iommu_num_pages 2010-08-09 20:45:05 -07:00
ioremap.c ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support 2011-01-12 03:06:19 -05:00
irq_regs.c
is_single_threaded.c kernel: is_current_single_threaded: don't use ->mmap_sem 2009-07-17 09:11:31 +10:00
kasprintf.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
Kconfig llist: Make some llist functions inline 2011-10-04 11:30:53 +02:00
Kconfig.debug Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-07-22 16:45:02 -07:00
Kconfig.kgdb mips,kgdb: kdb low level trap catch and stack trace 2010-05-20 21:04:26 -05:00
Kconfig.kmemcheck kmemcheck: depend on HAVE_ARCH_KMEMCHECK 2009-07-01 22:28:44 +02:00
klist.c driver core: Remove completion from struct klist_node 2009-01-06 10:44:30 -08:00
kobject_uevent.c kobject_uevent: fix typo in comments 2010-08-23 18:12:46 -07:00
kobject.c Delay struct net freeing while there's a sysfs instance refering to it 2011-06-12 17:45:41 -04:00
kref.c kref: Add a kref_sub function 2010-11-22 13:25:13 +10:00
kstrtox.c lib: make _tolower() public 2011-07-25 20:57:16 -07:00
lcm.c lib/lcm.c: quiet sparse noise 2011-07-25 20:57:15 -07:00
libcrc32c.c libcrc32c: Fix "crc32c undefined" compilation error 2008-12-25 11:01:42 +11:00
list_debug.c Expand CONFIG_DEBUG_LIST to several other list operations 2011-02-18 11:32:28 -08:00
list_sort.c lib/list_sort: test: check element addresses 2010-10-26 16:52:19 -07:00
llist.c llist: Remove cpu_relax() usage in cmpxchg loops 2011-10-04 12:44:03 +02:00
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c rcu: Fix unpaired rcu_irq_enter() from locking selftests 2011-05-26 09:42:19 -07:00
lru_cache.c lru_cache: use correct type in sizeof for allocation 2011-05-25 08:39:52 -07:00
Makefile llist: Make some llist functions inline 2011-10-04 11:30:53 +02:00
md5.c crypto: Move md5_transform to lib/md5.c 2011-08-06 18:32:45 -07:00
nlattr.c net: fix nla_policy_len to actually _iterate_ over the policy 2011-02-28 12:38:25 -08:00
parser.c Fix common misspellings 2011-03-31 11:26:23 -03:00
percpu_counter.c percpucounter: Optimize __percpu_counter_add a bit through the use of this_cpu() options. 2010-12-17 15:07:18 +01:00
plist.c plist: Remove the need to supply locks to plist heads 2011-07-08 14:02:53 +02:00
prio_heap.c lib: fix sparse shadowed variable warning 2009-01-06 15:59:11 -08:00
prio_tree.c
proportions.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-01-06 17:10:04 -08:00
radix-tree.c tmpfs radix_tree: locate_item to speed up swapoff 2011-08-03 14:25:24 -10:00
random32.c Merge branch 'master' into for-next 2010-06-16 18:08:13 +02:00
ratelimit.c ratelimit: fix the return value when __ratelimit() fails to acquire the lock 2010-04-07 08:38:04 -07:00
rational.c lib/rational.c needs module.h 2010-01-11 09:34:05 -08:00
rbtree.c Export the augmented rbtree helper functions 2011-01-28 12:16:59 +10:00
reciprocal_div.c
rwsem-spinlock.c rwsem generic spinlock: use IRQ save/restore spinlocks 2010-04-07 16:15:05 -07:00
rwsem.c rwsem: Remove redundant asmregparm annotation 2011-01-27 12:30:40 +01:00
scatterlist.c scatterlist: prevent invalid free when alloc fails 2010-08-30 19:55:09 +02:00
sha1.c lib/sha1.c: quiet sparse noise about symbol not declared 2011-09-13 16:09:41 -07:00
show_mem.c arch, mm: filter disallowed nodes from arch specific show_mem functions 2011-05-25 08:39:03 -07:00
smp_processor_id.c cpumask: convert lib/smp_processor_id to new cpumask ops 2009-01-30 15:47:34 +01:00
sort.c generic swap(): lib/sort.c: rename swap to swap_func 2009-01-08 08:31:14 -08:00
spinlock_debug.c locking: Further name space cleanups 2009-12-14 23:55:33 +01:00
string_helpers.c [SCSI] lib: string_get_size(): don't hang on zero; no decimals on exact 2008-10-23 11:42:20 -05:00
string.c Add a strtobool function matching semantics of existing in kernel equivalents 2011-05-19 16:55:28 +09:30
swiotlb.c swiotlb: Export swioltb_nr_tbl and utilize it as appropiate. 2011-06-06 15:41:16 -04:00
syscall.c task_current_syscall 2008-07-26 12:00:10 -07:00
test-kstrtox.c kstrtox: fix compile warnings in test 2011-04-14 16:06:54 -07:00
textsearch.c textsearch: doc - fix spelling in lib/textsearch.c. 2011-01-24 23:33:30 -08:00
timerqueue.c Fix common misspellings 2011-03-31 11:26:23 -03:00
ts_bm.c textsearch: ts_bm: support case insensitive searching in Boyer-Moore algorithm 2008-07-08 02:37:54 -07:00
ts_fsm.c textsearch: ts_fsm: return error on request for case insensitive search 2008-07-08 02:38:27 -07:00
ts_kmp.c textsearch: ts_kmp: support case insensitive searching in Knuth-Morris-Pratt algorithm 2008-07-08 02:38:09 -07:00
uuid.c Unified UUID/GUID definition 2010-05-19 22:40:47 -04:00
vsprintf.c Merge 'akpm' patch series 2011-07-25 21:00:19 -07:00