linux_dsm_epyc7002/arch/powerpc/lib
Simon Guo c2a4e54e8b powerpc/64: add 32 bytes prechecking before using VMX optimization on memcmp()
This patch is based on the previous VMX patch on memcmp().

To optimize ppc64 memcmp() with VMX instruction, we need to think about
the VMX penalty brought with: If kernel uses VMX instruction, it needs
to save/restore current thread's VMX registers. There are 32 x 128 bits
VMX registers in PPC, which means 32 x 16 = 512 bytes for load and store.

The major concern regarding the memcmp() performance in kernel is KSM,
who will use memcmp() frequently to merge identical pages. So it will
make sense to take some measures/enhancement on KSM to see whether any
improvement can be done here.  Cyril Bur indicates that the memcmp() for
KSM has a higher possibility to fail (unmatch) early in previous bytes
in following mail.
	https://patchwork.ozlabs.org/patch/817322/#1773629
And I am taking a follow-up on this with this patch.

Per some testing, it shows KSM memcmp() will fail early at previous 32
bytes.  More specifically:
    - 76% cases will fail/unmatch before 16 bytes;
    - 83% cases will fail/unmatch before 32 bytes;
    - 84% cases will fail/unmatch before 64 bytes;
So 32 bytes looks a better choice than other bytes for pre-checking.

The early failure is also true for memcmp() for non-KSM case. With a
non-typical call load, it shows ~73% cases fail before first 32 bytes.

This patch adds a 32 bytes pre-checking firstly before jumping into VMX
operations, to avoid the unnecessary VMX penalty. It is not limited to
KSM case. And the testing shows ~20% improvement on memcmp() average
execution time with this patch.

And note the 32B pre-checking is only performed when the compare size
is long enough (>=4K currently) to allow VMX operation.

The detail data and analysis is at:
https://github.com/justdoitqd/publicFiles/blob/master/memcmp/README.md

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:03:21 +10:00
..
alloc.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
checksum_32.S powerpc: Implement csum_ipv6_magic in assembly 2018-06-04 00:39:19 +10:00
checksum_64.S powerpc: Implement csum_ipv6_magic in assembly 2018-06-04 00:39:19 +10:00
checksum_wrappers.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
code-patching.c powerpc/lib/feature-fixups: use raw_patch_instruction() 2018-01-21 15:06:25 +11:00
copy_32.S powerpc/32: remove a NOP from memset() 2017-09-01 16:42:46 +10:00
copypage_64.S powerpc/64s: Set assembler machine type to POWER4 2018-04-01 00:47:49 +11:00
copypage_power7.S powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision 2018-07-24 22:03:21 +10:00
copyuser_64.S powerpc/64s: Set assembler machine type to POWER4 2018-04-01 00:47:49 +11:00
copyuser_power7.S powerpc/64s: Set assembler machine type to POWER4 2018-04-01 00:47:49 +11:00
crtsavres.S powerpc/64: Do not create new section for save/restore functions 2017-05-30 14:59:51 +10:00
div64.S powerpc: Fix a corner case in __div64_32 2005-10-20 09:37:02 +10:00
feature-fixups-test.S powerpc/lib: Add alt patching test of branching past the last instruction 2018-05-11 23:29:03 +10:00
feature-fixups.c powerpc updates for 4.18 2018-06-07 10:23:33 -07:00
hweight_64.S ppc: move exports to definitions 2016-08-07 23:50:09 -04:00
ldstfp.S powerpc: Fix kernel crash in emulation of vector loads and stores 2017-09-04 19:38:07 +10:00
locks.c powerpc/spinlock: Fix spin_unlock_wait() 2016-06-14 16:05:44 +10:00
Makefile powerpc/lib: optimise PPC32 memcmp 2018-06-04 00:39:21 +10:00
mem_64.S powerpc/string: Implement optimized memset variants 2017-08-17 23:04:35 +10:00
memcmp_32.S powerpc/lib: optimise PPC32 memcmp 2018-06-04 00:39:21 +10:00
memcmp_64.S powerpc/64: add 32 bytes prechecking before using VMX optimization on memcmp() 2018-07-24 22:03:21 +10:00
memcpy_64.S powerpc/64s: Set assembler machine type to POWER4 2018-04-01 00:47:49 +11:00
memcpy_power7.S powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision 2018-07-24 22:03:21 +10:00
pmem.c powerpc/lib: Implement UACCESS_FLUSHCACHE API 2017-11-13 08:00:31 +11:00
quad.S powerpc: Handle most loads and stores in instruction emulation code 2017-09-01 16:39:48 +10:00
rheap.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
sstep.c powerpc/sstep: Fix kernel crash if VSX is not present 2018-06-04 00:39:08 +10:00
string_32.S powerpc/lib: optimise 32 bits __clear_user() 2018-06-04 00:39:21 +10:00
string_64.S powerpc: Fix invalid use of register expressions 2017-08-10 22:29:41 +10:00
string.S powerpc/lib: optimise PPC32 memcmp 2018-06-04 00:39:21 +10:00
test_emulate_step.c powerpc/sstep: Fix emulate_step test if VSX not present 2018-06-04 00:39:14 +10:00
vmx-helper.c powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision 2018-07-24 22:03:21 +10:00
xor_vmx_glue.c powerpc/altivec: Add missing prototypes for altivec 2018-05-25 12:04:38 +10:00
xor_vmx.c powerpc/lib/xor_vmx: Ensure no altivec code executes before enable_kernel_altivec() 2017-06-02 20:17:52 +10:00
xor_vmx.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00