mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-12-19 00:57:49 +07:00
231457ec70
Benchmark the various operations allowed for epoll_ctl(2). The idea is to concurrently stress a single epoll instance doing add/mod/del operations. Committer testing: # perf bench epoll ctl # Running 'epoll/ctl' benchmark: Run summary [PID 20344]: 4 threads doing epoll_ctl ops 64 file-descriptors for 8 secs. [thread 0] fdmap: 0x21a46b0 ... 0x21a47ac [ add: 1680960 ops; mod: 1680960 ops; del: 1680960 ops ] [thread 1] fdmap: 0x21a4960 ... 0x21a4a5c [ add: 1685440 ops; mod: 1685440 ops; del: 1685440 ops ] [thread 2] fdmap: 0x21a4c10 ... 0x21a4d0c [ add: 1674368 ops; mod: 1674368 ops; del: 1674368 ops ] [thread 3] fdmap: 0x21a4ec0 ... 0x21a4fbc [ add: 1677568 ops; mod: 1677568 ops; del: 1677568 ops ] Averaged 1679584 ADD operations (+- 0.14%) Averaged 1679584 MOD operations (+- 0.14%) Averaged 1679584 DEL operations (+- 0.14%) # Lets measure those calls with 'perf trace' to get a glympse at what this benchmark is doing in terms of syscalls: # perf trace -m32768 -s perf bench epoll ctl # Running 'epoll/ctl' benchmark: Run summary [PID 20405]: 4 threads doing epoll_ctl ops 64 file-descriptors for 8 secs. [thread 0] fdmap: 0x21764e0 ... 0x21765dc [ add: 1100480 ops; mod: 1100480 ops; del: 1100480 ops ] [thread 1] fdmap: 0x2176790 ... 0x217688c [ add: 1250176 ops; mod: 1250176 ops; del: 1250176 ops ] [thread 2] fdmap: 0x2176a40 ... 0x2176b3c [ add: 1022464 ops; mod: 1022464 ops; del: 1022464 ops ] [thread 3] fdmap: 0x2176cf0 ... 0x2176dec [ add: 705472 ops; mod: 705472 ops; del: 705472 ops ] Averaged 1019648 ADD operations (+- 11.27%) Averaged 1019648 MOD operations (+- 11.27%) Averaged 1019648 DEL operations (+- 11.27%) Summary of events: epoll-ctl (20405), 1264 events, 0.0% syscall calls total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- --------- --------- --------- --------- ------ eventfd2 256 9.514 0.001 0.037 5.243 68.00% clone 4 1.245 0.204 0.311 0.531 24.13% mprotect 66 0.345 0.002 0.005 0.021 7.43% openat 45 0.313 0.004 0.007 0.073 21.93% mmap 88 0.302 0.002 0.003 0.013 5.02% futex 4 0.160 0.002 0.040 0.140 83.43% sched_setaffinity 4 0.124 0.005 0.031 0.070 49.39% read 44 0.103 0.001 0.002 0.013 15.54% fstat 40 0.052 0.001 0.001 0.003 5.43% close 39 0.039 0.001 0.001 0.001 1.48% stat 9 0.034 0.003 0.004 0.006 7.30% access 3 0.023 0.007 0.008 0.008 4.25% open 2 0.021 0.008 0.011 0.013 22.60% getdents 4 0.019 0.001 0.005 0.009 37.15% write 2 0.013 0.004 0.007 0.009 38.48% munmap 1 0.010 0.010 0.010 0.010 0.00% brk 3 0.006 0.001 0.002 0.003 26.34% rt_sigprocmask 2 0.004 0.001 0.002 0.003 43.95% rt_sigaction 3 0.004 0.001 0.001 0.002 16.07% prlimit64 3 0.004 0.001 0.001 0.001 5.39% prctl 1 0.003 0.003 0.003 0.003 0.00% epoll_create 1 0.003 0.003 0.003 0.003 0.00% lseek 2 0.002 0.001 0.001 0.001 11.42% sched_getaffinity 1 0.002 0.002 0.002 0.002 0.00% arch_prctl 1 0.002 0.002 0.002 0.002 0.00% set_tid_address 1 0.001 0.001 0.001 0.001 0.00% getpid 1 0.001 0.001 0.001 0.001 0.00% set_robust_list 1 0.001 0.001 0.001 0.001 0.00% execve 1 0.000 0.000 0.000 0.000 0.00% epoll-ctl (20406), 1245480 events, 14.6% syscall calls total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- --------- --------- --------- --------- ------ epoll_ctl 619511 1034.927 0.001 0.002 6.691 0.67% nanosleep 3226 616.114 0.006 0.191 10.376 7.57% futex 2 11.336 0.002 5.668 11.334 99.97% set_robust_list 1 0.001 0.001 0.001 0.001 0.00% clone 1 0.000 0.000 0.000 0.000 0.00% epoll-ctl (20407), 1243151 events, 14.5% syscall calls total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- --------- --------- --------- --------- ------ epoll_ctl 618350 1042.181 0.001 0.002 2.512 0.40% nanosleep 3220 366.261 0.012 0.114 18.162 9.59% futex 4 5.463 0.001 1.366 5.427 99.12% set_robust_list 1 0.002 0.002 0.002 0.002 0.00% epoll-ctl (20408), 1801690 events, 21.1% syscall calls total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- --------- --------- --------- --------- ------ epoll_ctl 896174 1540.581 0.001 0.002 6.987 0.74% nanosleep 4667 783.393 0.006 0.168 10.419 7.10% futex 2 4.682 0.002 2.341 4.681 99.93% set_robust_list 1 0.002 0.002 0.002 0.002 0.00% clone 1 0.000 0.000 0.000 0.000 0.00% epoll-ctl (20409), 4254890 events, 49.8% syscall calls total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- --------- --------- --------- --------- ------ epoll_ctl 2116416 3768.097 0.001 0.002 9.956 0.41% nanosleep 11023 1141.778 0.006 0.104 9.447 4.95% futex 3 0.037 0.002 0.012 0.029 70.50% set_robust_list 1 0.008 0.008 0.008 0.008 0.00% madvise 1 0.005 0.005 0.005 0.005 0.00% clone 1 0.000 0.000 0.000 0.000 0.00% # Committer notes: Fix build on fedora:24-x-ARC-uClibc, debian:experimental-x-mips, debian:experimental-x-mipsel, ubuntu:16.04-x-arm and ubuntu:16.04-x-powerpc CC /tmp/build/perf/bench/epoll-ctl.o bench/epoll-ctl.c: In function 'init_fdmaps': bench/epoll-ctl.c:214:16: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] for (i = 0; i < nfds; i+=inc) { ^ bench/epoll-ctl.c: In function 'bench_epoll_ctl': bench/epoll-ctl.c:377:16: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] for (i = 0; i < nthreads; i++) { ^ bench/epoll-ctl.c:388:16: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] for (i = 0; i < nthreads; i++) { ^ cc1: all warnings being treated as errors Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Jason Baron <jbaron@akamai.com> Link: http://lkml.kernel.org/r/20181106152226.20883-3-dave@stgolabs.net [ Use inttypes.h to print rlim_t fields, fixing the build on Alpine Linux / musl libc ] [ Check if eventfd() is available, i.e. if HAVE_EVENTFD is defined ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
220 lines
4.3 KiB
Plaintext
220 lines
4.3 KiB
Plaintext
perf-bench(1)
|
|
=============
|
|
|
|
NAME
|
|
----
|
|
perf-bench - General framework for benchmark suites
|
|
|
|
SYNOPSIS
|
|
--------
|
|
[verse]
|
|
'perf bench' [<common options>] <subsystem> <suite> [<options>]
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
This 'perf bench' command is a general framework for benchmark suites.
|
|
|
|
COMMON OPTIONS
|
|
--------------
|
|
-r::
|
|
--repeat=::
|
|
Specify amount of times to repeat the run (default 10).
|
|
|
|
-f::
|
|
--format=::
|
|
Specify format style.
|
|
Current available format styles are:
|
|
|
|
'default'::
|
|
Default style. This is mainly for human reading.
|
|
---------------------
|
|
% perf bench sched pipe # with no style specified
|
|
(executing 1000000 pipe operations between two tasks)
|
|
Total time:5.855 sec
|
|
5.855061 usecs/op
|
|
170792 ops/sec
|
|
---------------------
|
|
|
|
'simple'::
|
|
This simple style is friendly for automated
|
|
processing by scripts.
|
|
---------------------
|
|
% perf bench --format=simple sched pipe # specified simple
|
|
5.988
|
|
---------------------
|
|
|
|
SUBSYSTEM
|
|
---------
|
|
|
|
'sched'::
|
|
Scheduler and IPC mechanisms.
|
|
|
|
'mem'::
|
|
Memory access performance.
|
|
|
|
'numa'::
|
|
NUMA scheduling and MM benchmarks.
|
|
|
|
'futex'::
|
|
Futex stressing benchmarks.
|
|
|
|
'epoll'::
|
|
Eventpoll (epoll) stressing benchmarks.
|
|
|
|
'all'::
|
|
All benchmark subsystems.
|
|
|
|
SUITES FOR 'sched'
|
|
~~~~~~~~~~~~~~~~~~
|
|
*messaging*::
|
|
Suite for evaluating performance of scheduler and IPC mechanisms.
|
|
Based on hackbench by Rusty Russell.
|
|
|
|
Options of *messaging*
|
|
^^^^^^^^^^^^^^^^^^^^^^
|
|
-p::
|
|
--pipe::
|
|
Use pipe() instead of socketpair()
|
|
|
|
-t::
|
|
--thread::
|
|
Be multi thread instead of multi process
|
|
|
|
-g::
|
|
--group=::
|
|
Specify number of groups
|
|
|
|
-l::
|
|
--nr_loops=::
|
|
Specify number of loops
|
|
|
|
Example of *messaging*
|
|
^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
---------------------
|
|
% perf bench sched messaging # run with default
|
|
options (20 sender and receiver processes per group)
|
|
(10 groups == 400 processes run)
|
|
|
|
Total time:0.308 sec
|
|
|
|
% perf bench sched messaging -t -g 20 # be multi-thread, with 20 groups
|
|
(20 sender and receiver threads per group)
|
|
(20 groups == 800 threads run)
|
|
|
|
Total time:0.582 sec
|
|
---------------------
|
|
|
|
*pipe*::
|
|
Suite for pipe() system call.
|
|
Based on pipe-test-1m.c by Ingo Molnar.
|
|
|
|
Options of *pipe*
|
|
^^^^^^^^^^^^^^^^^
|
|
-l::
|
|
--loop=::
|
|
Specify number of loops.
|
|
|
|
Example of *pipe*
|
|
^^^^^^^^^^^^^^^^^
|
|
|
|
---------------------
|
|
% perf bench sched pipe
|
|
(executing 1000000 pipe operations between two tasks)
|
|
|
|
Total time:8.091 sec
|
|
8.091833 usecs/op
|
|
123581 ops/sec
|
|
|
|
% perf bench sched pipe -l 1000 # loop 1000
|
|
(executing 1000 pipe operations between two tasks)
|
|
|
|
Total time:0.016 sec
|
|
16.948000 usecs/op
|
|
59004 ops/sec
|
|
---------------------
|
|
|
|
SUITES FOR 'mem'
|
|
~~~~~~~~~~~~~~~~
|
|
*memcpy*::
|
|
Suite for evaluating performance of simple memory copy in various ways.
|
|
|
|
Options of *memcpy*
|
|
^^^^^^^^^^^^^^^^^^^
|
|
-l::
|
|
--size::
|
|
Specify size of memory to copy (default: 1MB).
|
|
Available units are B, KB, MB, GB and TB (case insensitive).
|
|
|
|
-f::
|
|
--function::
|
|
Specify function to copy (default: default).
|
|
Available functions are depend on the architecture.
|
|
On x86-64, x86-64-unrolled, x86-64-movsq and x86-64-movsb are supported.
|
|
|
|
-l::
|
|
--nr_loops::
|
|
Repeat memcpy invocation this number of times.
|
|
|
|
-c::
|
|
--cycles::
|
|
Use perf's cpu-cycles event instead of gettimeofday syscall.
|
|
|
|
*memset*::
|
|
Suite for evaluating performance of simple memory set in various ways.
|
|
|
|
Options of *memset*
|
|
^^^^^^^^^^^^^^^^^^^
|
|
-l::
|
|
--size::
|
|
Specify size of memory to set (default: 1MB).
|
|
Available units are B, KB, MB, GB and TB (case insensitive).
|
|
|
|
-f::
|
|
--function::
|
|
Specify function to set (default: default).
|
|
Available functions are depend on the architecture.
|
|
On x86-64, x86-64-unrolled, x86-64-stosq and x86-64-stosb are supported.
|
|
|
|
-l::
|
|
--nr_loops::
|
|
Repeat memset invocation this number of times.
|
|
|
|
-c::
|
|
--cycles::
|
|
Use perf's cpu-cycles event instead of gettimeofday syscall.
|
|
|
|
SUITES FOR 'numa'
|
|
~~~~~~~~~~~~~~~~~
|
|
*mem*::
|
|
Suite for evaluating NUMA workloads.
|
|
|
|
SUITES FOR 'futex'
|
|
~~~~~~~~~~~~~~~~~~
|
|
*hash*::
|
|
Suite for evaluating hash tables.
|
|
|
|
*wake*::
|
|
Suite for evaluating wake calls.
|
|
|
|
*wake-parallel*::
|
|
Suite for evaluating parallel wake calls.
|
|
|
|
*requeue*::
|
|
Suite for evaluating requeue calls.
|
|
|
|
*lock-pi*::
|
|
Suite for evaluating futex lock_pi calls.
|
|
|
|
SUITES FOR 'epoll'
|
|
~~~~~~~~~~~~~~~~~~
|
|
*wait*::
|
|
Suite for evaluating concurrent epoll_wait calls.
|
|
|
|
*ctl*::
|
|
Suite for evaluating multiple epoll_ctl calls.
|
|
|
|
SEE ALSO
|
|
--------
|
|
linkperf:perf[1]
|