linux_dsm_epyc7002/arch/arm
Clement Courbet 0ade34c370 lib: optimize cpumask_next_and()
We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
It's essentially a joined iteration in search for a non-zero bit, which is
currently implemented as a lookup join (find a nonzero bit on the lhs,
lookup the rhs to see if it's set there).

Implement a direct join (find a nonzero bit on the incrementally built
join).  Also add generic bitmap benchmarks in the new `test_find_bit`
module for new function (see `find_next_and_bit` in [2] and [3] below).

For cpumask_next_and, direct benchmarking shows that it's 1.17x to 14x
faster with a geometric mean of 2.1 on 32 CPUs [1].  No impact on memory
usage.  Note that on Arm, the new pure-C implementation still outperforms
the old one that uses a mix of C and asm (`find_next_bit`) [3].

[1] Approximate benchmark code:

```
  unsigned long src1p[nr_cpumask_longs] = {pattern1};
  unsigned long src2p[nr_cpumask_longs] = {pattern2};
  for (/*a bunch of repetitions*/) {
    for (int n = -1; n <= nr_cpu_ids; ++n) {
      asm volatile("" : "+rm"(src1p)); // prevent any optimization
      asm volatile("" : "+rm"(src2p));
      unsigned long result = cpumask_next_and(n, src1p, src2p);
      asm volatile("" : "+rm"(result));
    }
  }
```

Results:
pattern1    pattern2     time_before/time_after
0x0000ffff  0x0000ffff   1.65
0x0000ffff  0x00005555   2.24
0x0000ffff  0x00001111   2.94
0x0000ffff  0x00000000   14.0
0x00005555  0x0000ffff   1.67
0x00005555  0x00005555   1.71
0x00005555  0x00001111   1.90
0x00005555  0x00000000   6.58
0x00001111  0x0000ffff   1.46
0x00001111  0x00005555   1.49
0x00001111  0x00001111   1.45
0x00001111  0x00000000   3.10
0x00000000  0x0000ffff   1.18
0x00000000  0x00005555   1.18
0x00000000  0x00001111   1.17
0x00000000  0x00000000   1.25
-----------------------------
               geo.mean  2.06

[2] test_find_next_bit, X86 (skylake)

 [ 3913.477422] Start testing find_bit() with random-filled bitmap
 [ 3913.477847] find_next_bit: 160868 cycles, 16484 iterations
 [ 3913.477933] find_next_zero_bit: 169542 cycles, 16285 iterations
 [ 3913.478036] find_last_bit: 201638 cycles, 16483 iterations
 [ 3913.480214] find_first_bit: 4353244 cycles, 16484 iterations
 [ 3913.480216] Start testing find_next_and_bit() with random-filled
 bitmap
 [ 3913.481074] find_next_and_bit: 89604 cycles, 8216 iterations
 [ 3913.481075] Start testing find_bit() with sparse bitmap
 [ 3913.481078] find_next_bit: 2536 cycles, 66 iterations
 [ 3913.481252] find_next_zero_bit: 344404 cycles, 32703 iterations
 [ 3913.481255] find_last_bit: 2006 cycles, 66 iterations
 [ 3913.481265] find_first_bit: 17488 cycles, 66 iterations
 [ 3913.481266] Start testing find_next_and_bit() with sparse bitmap
 [ 3913.481272] find_next_and_bit: 764 cycles, 1 iterations

[3] test_find_next_bit, arm (v7 odroid XU3).

[  267.206928] Start testing find_bit() with random-filled bitmap
[  267.214752] find_next_bit: 4474 cycles, 16419 iterations
[  267.221850] find_next_zero_bit: 5976 cycles, 16350 iterations
[  267.229294] find_last_bit: 4209 cycles, 16419 iterations
[  267.279131] find_first_bit: 1032991 cycles, 16420 iterations
[  267.286265] Start testing find_next_and_bit() with random-filled
bitmap
[  267.302386] find_next_and_bit: 2290 cycles, 8140 iterations
[  267.309422] Start testing find_bit() with sparse bitmap
[  267.316054] find_next_bit: 191 cycles, 66 iterations
[  267.322726] find_next_zero_bit: 8758 cycles, 32703 iterations
[  267.329803] find_last_bit: 84 cycles, 66 iterations
[  267.336169] find_first_bit: 4118 cycles, 66 iterations
[  267.342627] Start testing find_next_and_bit() with sparse bitmap
[  267.356919] find_next_and_bit: 91 cycles, 1 iterations

[courbet@google.com: v6]
  Link: http://lkml.kernel.org/r/20171129095715.23430-1-courbet@google.com
[geert@linux-m68k.org: m68k/bitops: always include <asm-generic/bitops/find.h>]
  Link: http://lkml.kernel.org/r/1512556816-28627-1-git-send-email-geert@linux-m68k.org
Link: http://lkml.kernel.org/r/20171128131334.23491-1-courbet@google.com
Signed-off-by: Clement Courbet <courbet@google.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yury Norov <ynorov@caviumnetworks.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-06 18:32:44 -08:00
..
boot Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm 2018-02-02 09:50:51 -08:00
common Merge branches 'fixes', 'misc', 'sa1111' and 'sa1100-for-next' into for-next 2018-01-21 15:38:10 +00:00
configs ARM: SoC platform updates for 4.16 2018-02-01 16:17:40 -08:00
crypto crypto: hash - annotate algorithms taking optional key 2018-01-12 23:03:35 +11:00
firmware
include lib: optimize cpumask_next_and() 2018-02-06 18:32:44 -08:00
kernel Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm 2018-02-02 09:50:51 -08:00
kvm GICv4 Support for KVM/ARM for v4.15 2017-11-17 13:20:01 +01:00
lib Merge branches 'fixes', 'misc', 'sa1111' and 'sa1100-for-next' into for-next 2018-01-21 15:38:10 +00:00
mach-actions ARM: SoC platform updates for 4.15 2017-11-16 14:05:12 -08:00
mach-alpine
mach-artpec
mach-asm9260
mach-aspeed
mach-at91
mach-axxia
mach-bcm soc: brcmstb: biuctrl: Move to early_initcall 2017-12-20 17:37:44 -08:00
mach-berlin
mach-clps711x
mach-cns3xxx
mach-davinci Merge branch 'i2c/for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2018-02-04 10:57:43 -08:00
mach-digicolor
mach-dove
mach-ebsa110
mach-efm32
mach-ep93xx ARM: ep93xx: ts72xx: Add support for BK3 board - ts72xx derivative 2017-12-13 22:26:10 +01:00
mach-exynos ARM: EXYNOS: Add SPDX license identifiers 2018-01-03 18:36:22 +01:00
mach-footbridge Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-13 17:56:58 -08:00
mach-gemini
mach-highbank
mach-hisi
mach-imx ARM: imx: remove unused imx3 pm definitions 2017-12-26 16:30:20 +08:00
mach-integrator ARM: SoC platform updates for 4.15 2017-11-16 14:05:12 -08:00
mach-iop13xx
mach-iop32x treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
mach-iop33x
mach-ixp4xx w1: w1-gpio: Convert to use GPIO descriptors 2017-12-08 15:32:53 +01:00
mach-keystone
mach-ks8695 Merge branch 'i2c/for-4.15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2017-11-14 17:52:21 -08:00
mach-lpc18xx
mach-lpc32xx
mach-mediatek ARM: mediatek: use more generic prompts for SoCs with ARMv7 2017-12-20 15:48:18 +01:00
mach-meson Amlogic 32-bit DT changes for v4.16 2017-12-21 16:37:34 +01:00
mach-mmp ARM: pxa: move header file out of I2C realm 2017-11-28 22:49:30 +01:00
mach-moxart
mach-mv78xx0
mach-mvebu
mach-mxs
mach-netx
mach-nomadik
mach-nspire
mach-omap1 ARM: SoC platform updates for 4.15 2017-11-16 14:05:12 -08:00
mach-omap2 ARM: SoC driver updates for 4.16 2018-02-01 16:35:31 -08:00
mach-orion5x treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
mach-oxnas
mach-picoxcell
mach-prima2
mach-pxa ARM: SoC platform updates for 4.16 2018-02-01 16:17:40 -08:00
mach-qcom
mach-realview
mach-rockchip
mach-rpc
mach-s3c24xx ARM: S3C24XX: Add SPDX license identifiers 2018-01-03 18:36:43 +01:00
mach-s3c64xx ARM: S3C64XX: Add SPDX license identifiers 2018-01-03 18:42:53 +01:00
mach-s5pv210 ARM: S5PV210: Add SPDX license identifiers 2018-01-03 18:43:04 +01:00
mach-sa1100 ARM: sa1100/neponset: add GPIO drivers for control and modem registers 2018-01-01 00:50:05 +00:00
mach-shmobile ARM: SoC platform updates for 4.15 2017-11-16 14:05:12 -08:00
mach-socfpga
mach-spear
mach-sti
mach-stm32
mach-sunxi
mach-tango
mach-tegra Merge branch 'linus' into locking/core, to resolve conflicts 2017-11-07 10:32:44 +01:00
mach-u300
mach-uniphier kbuild: remove all dummy assignments to obj- 2017-11-18 11:46:06 +09:00
mach-ux500
mach-versatile
mach-vexpress ARM: SoC platform updates for 4.15 2017-11-16 14:05:12 -08:00
mach-vt8500
mach-w90x900
mach-zx
mach-zynq
mm Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm 2018-02-02 09:50:51 -08:00
net bpf, arm: remove obsolete exception handling from div/mod 2018-01-26 16:42:07 -08:00
nwfpe
oprofile
plat-iop
plat-omap ARM: SoC platform updates for 4.15 2017-11-16 14:05:12 -08:00
plat-orion
plat-pxa
plat-samsung ARM: SAMSUNG: Add SPDX license identifiers 2018-01-03 18:43:13 +01:00
plat-versatile
probes ARM: probes: avoid adding kprobes to sensitive kernel-entry/exit code 2017-12-17 22:14:21 +00:00
tools ARM: ep93xx: ts72xx: Add support for BK3 board - ts72xx derivative 2017-12-13 22:26:10 +01:00
vdso Merge branch 'linus' into locking/core, to resolve conflicts 2017-11-07 10:32:44 +01:00
vfp signal/arm: Document conflicts with SI_USER and SIGFPE 2018-01-12 14:21:05 -06:00
xen xen: re-introduce support for grant v2 interface 2017-11-06 15:50:17 -05:00
Kconfig Currently, hardened usercopy performs dynamic bounds checking on slab 2018-02-03 16:25:42 -08:00
Kconfig-nommu Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm 2017-11-16 12:50:35 -08:00
Kconfig.debug ARM: 8737/1: mm: dump: add checking for writable and executable 2018-01-21 15:32:20 +00:00
Makefile ARM: 8723/2: always assume the "unified" syntax for assembly code 2017-12-17 22:14:21 +00:00