linux_dsm_epyc7002/arch/x86/lib
Borislav Petkov f25d384755 x86/asm: Optimize clear_page()
Currently, we CALL clear_page() which then JMPs to the proper function
chosen by the alternatives.

What we should do instead is CALL the proper function directly. (This
was something Ingo suggested a while ago). So let's do that.

Measuring our favourite kernel build workload shows that there are no
significant changes in performance.

AMD
===
  -- /tmp/before 2017-02-09 18:01:46.451961188 +0100
  ++ /tmp/after  2017-02-09 18:01:54.883961175 +0100
  @@ -1,15 +1,15 @@
    Performance counter stats for 'system wide' (5 runs):

  -    1028960.373643      cpu-clock (msec)          #    6.000 CPUs utilized            ( +-  1.41% )
  +    1023086.018961      cpu-clock (msec)          #    6.000 CPUs utilized            ( +-  1.20% )
  -           518,744      context-switches          #    0.504 K/sec                    ( +-  1.04% )
  +           518,254      context-switches          #    0.507 K/sec                    ( +-  1.01% )
  -            38,112      cpu-migrations            #    0.037 K/sec                    ( +-  1.95% )
  +            37,917      cpu-migrations            #    0.037 K/sec                    ( +-  1.02% )
  -        20,874,266      page-faults               #    0.020 M/sec                    ( +-  0.07% )
  +        20,918,897      page-faults               #    0.020 M/sec                    ( +-  0.18% )
  - 2,043,646,230,667      cycles                    #    1.986 GHz                      ( +-  0.14% )  (66.67%)
  + 2,045,305,584,032      cycles                    #    1.999 GHz                      ( +-  0.16% )  (66.67%)
  -   553,698,855,431      stalled-cycles-frontend   #   27.09% frontend cycles idle     ( +-  0.07% )  (66.67%)
  +   555,099,401,413      stalled-cycles-frontend   #   27.14% frontend cycles idle     ( +-  0.13% )  (66.67%)
  -   621,544,286,390      stalled-cycles-backend    #   30.41% backend cycles idle      ( +-  0.39% )  (66.67%)
  +   621,371,430,254      stalled-cycles-backend    #   30.38% backend cycles idle      ( +-  0.32% )  (66.67%)
  - 1,738,364,431,659      instructions              #    0.85  insn per cycle
  + 1,739,895,771,901      instructions              #    0.85  insn per cycle
  -                                                  #    0.36  stalled cycles per insn  ( +-  0.11% )  (66.67%)
  +                                                  #    0.36  stalled cycles per insn  ( +-  0.13% )  (66.67%)
  -   391,170,943,850      branches                  #  380.161 M/sec                    ( +-  0.13% )  (66.67%)
  +   391,398,551,757      branches                  #  382.567 M/sec                    ( +-  0.13% )  (66.67%)
  -    22,567,810,411      branch-misses             #    5.77% of all branches          ( +-  0.11% )  (66.67%)
  +    22,574,726,683      branch-misses             #    5.77% of all branches          ( +-  0.13% )  (66.67%)

  -     171.480741921 seconds time elapsed                                          ( +-  1.41% )
  +     170.509229451 seconds time elapsed                                          ( +-  1.20% )

Intel
=====

  -- /tmp/before 2017-02-09 20:36:19.851947473 +0100
  ++ /tmp/after  2017-02-09 20:36:30.151947458 +0100
  @@ -1,15 +1,15 @@
    Performance counter stats for 'system wide' (5 runs):

  -    2207248.598126      cpu-clock (msec)          #    8.000 CPUs utilized            ( +-  0.69% )
  +    2213300.106631      cpu-clock (msec)          #    8.000 CPUs utilized            ( +-  0.73% )
  -           899,342      context-switches          #    0.407 K/sec                    ( +-  0.68% )
  +           898,381      context-switches          #    0.406 K/sec                    ( +-  0.79% )
  -            80,553      cpu-migrations            #    0.036 K/sec                    ( +-  1.13% )
  +            80,979      cpu-migrations            #    0.037 K/sec                    ( +-  1.11% )
  -        36,171,148      page-faults               #    0.016 M/sec                    ( +-  0.02% )
  +        36,179,791      page-faults               #    0.016 M/sec                    ( +-  0.02% )
  - 6,665,288,826,484      cycles                    #    3.020 GHz                      ( +-  0.07% )  (83.33%)
  + 6,671,638,410,799      cycles                    #    3.014 GHz                      ( +-  0.06% )  (83.33%)
  - 5,065,975,115,197      stalled-cycles-frontend   #   76.01% frontend cycles idle     ( +-  0.11% )  (83.33%)
  + 5,076,835,183,223      stalled-cycles-frontend   #   76.10% frontend cycles idle     ( +-  0.11% )  (83.33%)
  - 3,841,556,350,614      stalled-cycles-backend    #   57.64% backend cycles idle      ( +-  0.13% )  (66.67%)
  + 3,852,823,974,333      stalled-cycles-backend    #   57.75% backend cycles idle      ( +-  0.12% )  (66.67%)
  - 4,148,398,171,079      instructions              #    0.62  insn per cycle
  + 4,148,997,156,059      instructions              #    0.62  insn per cycle
  -                                                  #    1.22  stalled cycles per insn  ( +-  0.10% )  (83.33%)
  +                                                  #    1.22  stalled cycles per insn  ( +-  0.11% )  (83.33%)
  -   887,187,118,591      branches                  #  401.943 M/sec                    ( +-  0.09% )  (83.33%)
  +   887,271,341,121      branches                  #  400.882 M/sec                    ( +-  0.11% )  (83.33%)
  -    30,139,439,034      branch-misses             #    3.40% of all branches          ( +-  0.09% )  (83.33%)
  +    30,134,864,997      branch-misses             #    3.40% of all branches          ( +-  0.06% )  (83.33%)

  -     275.904405540 seconds time elapsed                                          ( +-  0.69% )
  +     276.660352016 seconds time elapsed                                          ( +-  0.73% )

allmodconfig vmlinux size grows by a ~1Kb but that's fine - we optimize
our calling of the clear_page variants.

     text    data     bss     dec     hex filename
  9051979 23067670        27009024        59128673        3863b61		vmlinux
  9053000 23067670        27009024        59129694        3863f5e		vmlinux.clear_page

Reported-by: kernel test robot <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170215111927.emdgxf2pide3kwro@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-07 08:28:00 +01:00
..
.gitignore x86: Gitignore: arch/x86/lib/inat-tables.c 2009-11-04 13:11:28 +01:00
atomic64_32.c x86: Adjust asm constraints in atomic64 wrappers 2012-01-20 17:29:31 -08:00
atomic64_386_32.S x86/debug: Remove perpetually broken, unmaintainable dwarf annotations 2015-06-02 07:57:48 +02:00
atomic64_cx8_32.S x86/debug: Remove perpetually broken, unmaintainable dwarf annotations 2015-06-02 07:57:48 +02:00
cache-smp.c x86/lib: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:58 +02:00
checksum_32.S x86: move exports to actual definitions 2016-08-07 23:47:15 -04:00
clear_page_64.S x86/asm: Optimize clear_page() 2017-03-07 08:28:00 +01:00
cmdline.c x86/boot: Pass in size to early cmdline parsing 2016-02-03 12:03:18 +01:00
cmpxchg8b_emu.S x86: move exports to actual definitions 2016-08-07 23:47:15 -04:00
cmpxchg16b_emu.S x86/debug: Remove perpetually broken, unmaintainable dwarf annotations 2015-06-02 07:57:48 +02:00
copy_page_64.S x86: move exports to actual definitions 2016-08-07 23:47:15 -04:00
copy_user_64.S x86/copy_user: Unify the code by removing the 64-bit asm _copy_*_user() variants 2016-11-01 07:41:27 +01:00
cpu.c x86/lib: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:58 +02:00
csum-copy_64.S x86/debug: Remove perpetually broken, unmaintainable dwarf annotations 2015-06-02 07:57:48 +02:00
csum-partial_64.c x86: move exports to actual definitions 2016-08-07 23:47:15 -04:00
csum-wrappers_64.c Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-08-01 14:23:42 -04:00
delay.c x86/timer: Make delay() work during early bootup 2017-01-22 10:03:12 +01:00
getuser.S x86: move exports to actual definitions 2016-08-07 23:47:15 -04:00
hweight.S Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2016-10-14 14:26:58 -07:00
inat.c x86: Fix to decode grouped AVX with VEX pp bits 2012-02-11 15:11:35 +01:00
insn.c x86/insn: Add AVX-512 support to the instruction decoder 2016-07-21 09:37:11 -03:00
iomap_copy_64.S x86/debug: Remove perpetually broken, unmaintainable dwarf annotations 2015-06-02 07:57:48 +02:00
kaslr.c x86/mm/kaslr: Fix -Wformat-security warning 2016-08-11 10:58:12 +02:00
Makefile Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-07-25 17:32:28 -07:00
memcpy_32.c x86/lib: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:58 +02:00
memcpy_64.S Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2016-10-14 14:26:58 -07:00
memmove_64.S x86: move exports to actual definitions 2016-08-07 23:47:15 -04:00
memset_64.S x86: move exports to actual definitions 2016-08-07 23:47:15 -04:00
misc.c x86/boot: Further compress CPUs bootup message 2013-10-01 10:52:30 +02:00
mmx_32.c x86/lib: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:58 +02:00
msr-reg-export.c x86/lib: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:58 +02:00
msr-reg.S x86/debug: Remove perpetually broken, unmaintainable dwarf annotations 2015-06-02 07:57:48 +02:00
msr-smp.c x86/lib: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:58 +02:00
msr.c x86/msr: Cleanup/streamline MSR helpers 2016-11-16 10:23:02 +01:00
putuser.S x86: move exports to actual definitions 2016-08-07 23:47:15 -04:00
rwsem.S locking/rwsem: Fix comment on register clobbering 2016-05-16 12:35:40 +02:00
string_32.c x86/lib: Audit and remove any unnecessary uses of module.h 2016-07-14 15:06:58 +02:00
strstr_32.c x86: move exports to actual definitions 2016-08-07 23:47:15 -04:00
usercopy_32.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
usercopy_64.c Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-08-01 14:23:42 -04:00
usercopy.c x86/copy_user: Unify the code by removing the 64-bit asm _copy_*_user() variants 2016-11-01 07:41:27 +01:00
x86-opcode-map.txt libnvdimm for 4.8 2016-07-28 17:38:16 -07:00