linux_dsm_epyc7002/lib
Sergey Senozhatsky 166a0f780a lib/bsearch.c: micro-optimize pivot position calculation
There is a slightly faster way (in terms of the number of instructions
being used) to calculate the position of a middle element, preserving
integer overflow safeness.

./scripts/bloat-o-meter lib/bsearch.o.old lib/bsearch.o.new
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-24 (-24)
function                                     old     new   delta
bsearch                                      122      98     -24

TEST

INT array of size 100001, elements [0..100000]. gcc 7.1, Os, x86_64.

a) bsearch() of existing key "100001 - 2":

BASE
====

$ perf stat ./a.out

 Performance counter stats for './a.out':

        619.445196      task-clock:u (msec)       #    0.999 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
               133      page-faults:u             #    0.215 K/sec
     1,949,517,279      cycles:u                  #    3.147 GHz                      (83.06%)
       181,017,938      stalled-cycles-frontend:u #    9.29% frontend cycles idle     (83.05%)
        82,959,265      stalled-cycles-backend:u  #    4.26% backend cycles idle      (67.02%)
     4,355,706,383      instructions:u            #    2.23  insn per cycle
                                                  #    0.04  stalled cycles per insn  (83.54%)
     1,051,539,242      branches:u                # 1697.550 M/sec                    (83.54%)
        15,263,381      branch-misses:u           #    1.45% of all branches          (83.43%)

       0.620082548 seconds time elapsed

PATCHED
=======

$ perf stat ./a.out

 Performance counter stats for './a.out':

        475.097316      task-clock:u (msec)       #    0.999 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
               135      page-faults:u             #    0.284 K/sec
     1,487,467,717      cycles:u                  #    3.131 GHz                      (82.95%)
       186,537,162      stalled-cycles-frontend:u #   12.54% frontend cycles idle     (82.93%)
        28,797,869      stalled-cycles-backend:u  #    1.94% backend cycles idle      (67.10%)
     3,807,564,203      instructions:u            #    2.56  insn per cycle
                                                  #    0.05  stalled cycles per insn  (83.57%)
     1,049,344,291      branches:u                # 2208.693 M/sec                    (83.60%)
             5,485      branch-misses:u           #    0.00% of all branches          (83.58%)

       0.475760235 seconds time elapsed

b) bsearch() of un-existing key "100001 + 2":

BASE
====

$ perf stat ./a.out

 Performance counter stats for './a.out':

        499.244480      task-clock:u (msec)       #    0.999 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
               132      page-faults:u             #    0.264 K/sec
     1,571,194,855      cycles:u                  #    3.147 GHz                      (83.18%)
        13,450,980      stalled-cycles-frontend:u #    0.86% frontend cycles idle     (83.18%)
        21,256,072      stalled-cycles-backend:u  #    1.35% backend cycles idle      (66.78%)
     4,171,197,909      instructions:u            #    2.65  insn per cycle
                                                  #    0.01  stalled cycles per insn  (83.68%)
     1,009,175,281      branches:u                # 2021.405 M/sec                    (83.79%)
             3,122      branch-misses:u           #    0.00% of all branches          (83.37%)

       0.499871144 seconds time elapsed

PATCHED
=======

$ perf stat ./a.out

 Performance counter stats for './a.out':

        399.023499      task-clock:u (msec)       #    0.998 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
               134      page-faults:u             #    0.336 K/sec
     1,245,793,991      cycles:u                  #    3.122 GHz                      (83.39%)
        11,529,273      stalled-cycles-frontend:u #    0.93% frontend cycles idle     (83.46%)
        12,116,311      stalled-cycles-backend:u  #    0.97% backend cycles idle      (66.92%)
     3,679,710,005      instructions:u            #    2.95  insn per cycle
                                                  #    0.00  stalled cycles per insn  (83.47%)
     1,009,792,625      branches:u                # 2530.660 M/sec                    (83.46%)
             2,590      branch-misses:u           #    0.00% of all branches          (83.12%)

       0.399733539 seconds time elapsed

Link: http://lkml.kernel.org/r/20170607150457.5905-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:35 -07:00
..
842
fonts lib/fonts/Kconfig: keep non-Sparc fonts listed together 2017-02-27 18:43:46 -08:00
lz4 lib/lz4: remove back-compat wrappers 2017-02-24 17:46:57 -08:00
lzo
mpi
raid6 lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position 2017-05-16 10:01:57 +05:30
reed_solomon
xz
zlib_deflate
zlib_inflate lib/zlib_inflate/inftrees.c: fix potential buffer overflow 2017-05-08 17:15:12 -07:00
.gitignore
argv_split.c
asn1_decoder.c
assoc_array.c
atomic64_test.c lib: add module support to atomic64 tests 2017-02-24 17:46:57 -08:00
atomic64.c
audit.c
bcd.c
bch.c
bitmap.c bitmap: optimise bitmap_set and bitmap_clear of a single bit 2017-07-10 16:32:34 -07:00
bitrev.c
bsearch.c lib/bsearch.c: micro-optimize pivot position calculation 2017-07-10 16:32:35 -07:00
btree.c
bug.c debug: Add _ONCE() logic to report_bug() 2017-03-30 09:37:20 +02:00
build_OID_registry
bust_spinlocks.c
chacha20.c
check_signature.c
checksum.c
clz_ctz.c
clz_tab.c
cmdline.c lib/cmdline.c: fix get_options() overflow while parsing ranges 2017-06-23 16:15:55 -07:00
compat_audit.c
cordic.c
cpu_rmap.c
cpumask.c sched/fair, cpumask: Export for_each_cpu_wrap() 2017-05-15 10:15:23 +02:00
crc4.c lib: Add crc4 module 2017-06-09 11:52:07 +02:00
crc7.c
crc8.c
crc16.c
crc32.c lib: add module support to crc32 tests 2017-02-24 17:46:57 -08:00
crc32defs.h
crc32test.c lib: add module support to crc32 tests 2017-02-24 17:46:57 -08:00
crc-ccitt.c
crc-itu-t.c
crc-t10dif.c
ctype.c
debug_info.c
debug_locks.c
debugobjects.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> 2017-03-02 08:42:36 +01:00
dec_and_lock.c
decompress_bunzip2.c
decompress_inflate.c
decompress_unlz4.c lib/decompress_unlz4: change module to work with new LZ4 module version 2017-02-24 17:46:57 -08:00
decompress_unlzma.c
decompress_unlzo.c
decompress_unxz.c
decompress.c
devres.c devres: fix devm_ioremap_*() offset parameter kerneldoc description 2017-04-24 13:53:13 -05:00
digsig.c KEYS: Differentiate uses of rcu_dereference_key() and user_key_payload() 2017-03-02 10:09:00 +11:00
div64.c
dma-debug.c dmaengine updates for 4.12-rc1 2017-05-09 15:40:28 -07:00
dma-noop.c dma: Take into account dma_pfn_offset 2017-06-28 06:55:01 -07:00
dma-virt.c dma-virt: remove dma_supported and mapping_error methods 2017-06-28 06:54:41 -07:00
dump_stack.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h> 2017-03-02 08:42:34 +01:00
dynamic_debug.c
dynamic_queue_limits.c
earlycpio.c
errseq.c lib: add errseq_t type and infrastructure for handling it 2017-07-06 07:02:24 -04:00
extable.c lib/extable.c: use bsearch() library function in search_extable() 2017-07-10 16:32:35 -07:00
fault-inject.c lib/fault-inject.c: use correct check for interrupts 2017-05-08 17:15:12 -07:00
fdt_empty_tree.c
fdt_ro.c
fdt_rw.c
fdt_strerror.c
fdt_sw.c
fdt_wip.c
fdt.c
find_bit.c lib/find_bit.c: micro-optimise find_next_*_bit 2017-02-24 17:46:57 -08:00
flex_array.c
flex_proportions.c percpu_counter: Rename __percpu_counter_add to percpu_counter_add_batch 2017-06-20 15:42:32 -04:00
gcd.c
gen_crc32table.c
genalloc.c
glob.c lib: add module support to glob tests 2017-02-24 17:46:57 -08:00
globtest.c lib: add module support to glob tests 2017-02-24 17:46:57 -08:00
hexdump.c
hweight.c
idr.c idr: Add missing __rcu annotations 2017-02-13 21:44:10 -05:00
inflate.c
int_sqrt.c
interval_tree_test.c lib/interval_tree_test.c: allow full tree search 2017-07-10 16:32:35 -07:00
interval_tree.c
iomap_copy.c
iomap.c
iommu-common.c
iommu-helper.c
ioremap.c mm: convert generic code to 5-level paging 2017-03-09 11:48:47 -08:00
iov_iter.c Merge branch 'uaccess-work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-07 20:39:20 -07:00
irq_poll.c
irq_regs.c
is_single_threaded.c sched/headers: Prepare to move 'init_task' and 'init_thread_union' from <linux/sched.h> to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
jedec_ddr_data.c
kasprintf.c
Kconfig libnvdimm for 4.13 2017-07-07 09:44:06 -07:00
Kconfig.debug lib/interval_tree_test.c: allow the module to be compiled-in 2017-07-10 16:32:35 -07:00
Kconfig.kasan
Kconfig.kgdb lib: update location of kgdb documentation 2017-05-16 08:44:22 -03:00
Kconfig.kmemcheck
Kconfig.ubsan
kfifo.c
klist.c
kobject_uevent.c kobject: support passing in variables for synthetic uevents 2017-05-25 18:30:51 +02:00
kobject.c kobject: Export kobject_get_unless_zero() 2017-03-22 20:11:35 -06:00
kstrtox.c lib/kstrtox.c: use "unsigned int" more 2017-07-10 16:32:34 -07:00
kstrtox.h
lcm.c
libcrc32c.c crypto: Work around deallocated stack frame reference gcc bug on sparc. 2017-06-08 17:36:03 +08:00
list_debug.c bug: switch data corruption check to __must_check 2017-02-24 17:46:56 -08:00
list_sort.c lib: add module support to linked list sorting tests 2017-05-08 17:15:10 -07:00
llist.c
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-rtmutex.h locking/selftest: Add RT-mutex support 2017-06-08 10:35:50 +02:00
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c locking/selftest: Add RT-mutex support 2017-06-08 10:35:50 +02:00
lockref.c
lru_cache.c
Makefile Writeback error handling fixes (pile #2) 2017-07-07 19:38:17 -07:00
memory-notifier-error-inject.c
memweight.c
net_utils.c
netdev-notifier-error-inject.c
nlattr.c net: manual clean code which call skb_put_[data:zero] 2017-06-20 13:30:15 -04:00
nmi_backtrace.c printk: Use the main logbuf in NMI when logbuf_lock is available 2017-05-19 14:42:19 +02:00
nodemask.c
notifier-error-inject.c
notifier-error-inject.h
of-reconfig-notifier-error-inject.c
oid_registry.c
once.c
parman.c lib: Introduce priority array area manager 2017-02-03 16:35:42 -05:00
parser.c
pci_iomap.c
percpu_counter.c percpu_counter: Rename __percpu_counter_add to percpu_counter_add_batch 2017-06-20 15:42:32 -04:00
percpu_ida.c sched/headers: Prepare to remove the <linux/gfp.h> include from <linux/sched.h> 2017-03-02 08:42:34 +01:00
percpu_test.c
percpu-refcount.c percpu-refcount: support synchronous switch to atomic mode. 2017-03-22 19:18:43 -07:00
plist.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> 2017-03-02 08:42:27 +01:00
pm-notifier-error-inject.c
prime_numbers.c
radix-tree.c lockdep: allow to disable reclaim lockup detection 2017-05-03 15:52:09 -07:00
random32.c
ratelimit.c
rational.c
rbtree_test.c
rbtree.c rbtree: use designated initializers 2017-02-24 17:46:57 -08:00
reciprocal_div.c
refcount.c locking/refcount: Create unchecked atomic_t implementation 2017-06-28 18:54:46 +02:00
rhashtable.c lib/rhashtable.c: use kvzalloc() in bucket_table_alloc() when possible 2017-07-10 16:32:35 -07:00
sbitmap.c sbitmap: add sbitmap_get_shallow() operation 2017-04-14 14:06:52 -06:00
scatterlist.c scatterlist: add sg_zero_buffer() helper 2017-06-15 14:30:14 +02:00
seq_buf.c
sg_pool.c
sg_split.c
sha1.c
show_mem.c lib/show_mem.c: teach show_mem to work with the given nodemask 2017-02-22 16:41:30 -08:00
siphash.c
smp_processor_id.c sched/core: Enable might_sleep() and smp_processor_id() checks early 2017-05-23 10:01:38 +02:00
sort.c lib: add CONFIG_TEST_SORT to enable self-test of sort() 2017-02-24 17:46:57 -08:00
stackdepot.c
stmp_device.c
string_helpers.c
string.c USB patches for 4.12-rc1 2017-05-04 18:03:51 -07:00
strncpy_from_user.c
strnlen_user.c kill strlen_user() 2017-05-15 23:40:22 -04:00
swiotlb.c
syscall.c lib/syscall: Clear return values when no stack 2017-03-24 07:43:35 +01:00
test_bitmap.c bitmap: optimise bitmap_set and bitmap_clear of a single bit 2017-07-10 16:32:34 -07:00
test_bpf.c net: introduce __skb_put_[zero, data, u8] 2017-06-20 13:30:14 -04:00
test_firmware.c
test_hash.c
test_hexdump.c
test_kasan.c kasan: report only the first error by default 2017-03-31 17:13:30 -07:00
test_list_sort.c lib: add module support to linked list sorting tests 2017-05-08 17:15:10 -07:00
test_module.c
test_parman.c lib: fix spelling mistake: "actualy" -> "actually" 2017-02-26 11:03:38 -05:00
test_printf.c
test_rhashtable.c
test_siphash.c
test_sort.c Revert "lib/test_sort.c: make it explicitly non-modular" 2017-05-08 17:15:10 -07:00
test_static_key_base.c
test_static_keys.c
test_user_copy.c lib: remove check for AVR32 arch in test_user_copy 2017-05-01 09:36:30 +02:00
test_uuid.c uuid: hoist helpers uuid_equal() and uuid_copy() from xfs 2017-06-05 16:59:04 +02:00
test-kstrtox.c
test-string_helpers.c
textsearch.c
timerqueue.c
ts_bm.c
ts_fsm.c
ts_kmp.c
ubsan.c
ubsan.h
ucs2_string.c
usercopy.c copy_{from,to}_user(): move kasan checks and might_fault() out-of-line 2017-06-29 22:21:20 -04:00
uuid.c uuid: hoist uuid_is_null() helper from libnvdimm 2017-06-05 16:59:05 +02:00
vsprintf.c DeviceTree for 4.13: 2017-07-07 10:37:54 -07:00
win_minmax.c