linux_dsm_epyc7002/include
Hiro Yoshioka c22ce143d1 [PATCH] x86: cache pollution aware __copy_from_user_ll()
Use the x86 cache-bypassing copy instructions for copy_from_user().

Some performance data are

Total of GLOBAL_POWER_EVENTS (CPU cycle samples)

2.6.12.4.orig    1921587
2.6.12.4.nt      1599424
1599424/1921587=83.23% (16.77% reduction)

BSQ_CACHE_REFERENCE (L3 cache miss)
2.6.12.4.orig      57427
2.6.12.4.nt        20858
20858/57427=36.32% (63.7% reduction)

L3 cache miss reduction of __copy_from_user_ll
samples  %
37408    65.1412  vmlinux                  __copy_from_user_ll
23        0.1103  vmlinux                  __copy_user_zeroing_intel_nocache
23/37408=0.061% (99.94% reduction)

Top 5 of 2.6.12.4.nt
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000
samples  %        app name                 symbol name
128392    8.0274  vmlinux                  __copy_user_zeroing_intel_nocache
64206     4.0143  vmlinux                  journal_add_journal_head
59746     3.7355  vmlinux                  do_get_write_access
47674     2.9807  vmlinux                  journal_put_journal_head
46021     2.8774  vmlinux                  journal_dirty_metadata
pattern9-0-cpu4-0-09011728/summary.out

Counted BSQ_CACHE_REFERENCE events (cache references seen by the bus unit) with a unit mask of 0x3f (multiple flags) count 3000
samples  %        app name                 symbol name
69755     4.2861  vmlinux                  __copy_user_zeroing_intel_nocache
55685     3.4215  vmlinux                  journal_add_journal_head
52371     3.2179  vmlinux                  __find_get_block
45504     2.7960  vmlinux                  journal_put_journal_head
36005     2.2123  vmlinux                  journal_stop
pattern9-0-cpu4-0-09011744/summary.out

Counted BSQ_CACHE_REFERENCE events (cache references seen by the bus unit) with a unit mask of 0x200 (read 3rd level cache miss) count 3000
samples  %        app name                 symbol name
1147      5.4994  vmlinux                  journal_add_journal_head
881       4.2240  vmlinux                  journal_dirty_data
872       4.1809  vmlinux                  blk_rq_map_sg
734       3.5192  vmlinux                  journal_commit_transaction
617       2.9582  vmlinux                  radix_tree_delete
pattern9-0-cpu4-0-09011731/summary.out

iozone results are

original 2.6.12.4 CPU time = 207.768 sec
cache aware       CPU time = 184.783 sec
(three times run)
184.783/207.768=88.94% (11.06% reduction)

original:
pattern9-0-cpu4-0-08191720/iozone.out:  CPU Utilization: Wall time   45.997    CPU time   64.527    CPU utilization 140.28 %
pattern9-0-cpu4-0-08191741/iozone.out:  CPU Utilization: Wall time   46.878    CPU time   71.933    CPU utilization 153.45 %
pattern9-0-cpu4-0-08191743/iozone.out:  CPU Utilization: Wall time   45.152    CPU time   71.308    CPU utilization 157.93 %

cache awre:
pattern9-0-cpu4-0-09011728/iozone.out:  CPU Utilization: Wall time   44.842    CPU time   62.465    CPU utilization 139.30 %
pattern9-0-cpu4-0-09011731/iozone.out:  CPU Utilization: Wall time   44.718    CPU time   59.273    CPU utilization 132.55 %
pattern9-0-cpu4-0-09011744/iozone.out:  CPU Utilization: Wall time   44.367    CPU time   63.045    CPU utilization 142.10 %

Signed-off-by: Hiro Yoshioka <hyoshiok@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:56 -07:00
..
acpi [PATCH] Unify pxm_to_node() and node_to_pxm() 2006-06-23 07:42:48 -07:00
asm-alpha [PATCH] vgacon: make VGA_MAP_MEM take size, remove extra use 2006-06-22 15:05:58 -07:00
asm-arm Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm 2006-06-22 22:46:28 -07:00
asm-arm26
asm-cris
asm-frv [PATCH] frv: clean frv unistd.h 2006-06-23 07:42:55 -07:00
asm-generic [PATCH] squash duplicate page_to_pfn and pfn_to_page 2006-06-23 07:42:47 -07:00
asm-h8300
asm-i386 [PATCH] x86: cache pollution aware __copy_from_user_ll() 2006-06-23 07:42:56 -07:00
asm-ia64 [PATCH] page migration: sys_move_pages(): support moving of individual pages 2006-06-23 07:42:53 -07:00
asm-m32r [PATCH] vgacon: make VGA_MAP_MEM take size, remove extra use 2006-06-22 15:05:58 -07:00
asm-m68k
asm-m68knommu
asm-mips [PATCH] Au1550/1200: add missing PSC #define's, make OSS driver use the proper ones 2006-06-23 07:42:56 -07:00
asm-parisc [PATCH] Delete unused definitions of kvaddr_to_nid 2006-06-23 07:42:52 -07:00
asm-powerpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc 2006-06-22 22:11:30 -07:00
asm-ppc Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc 2006-06-22 22:11:30 -07:00
asm-s390 [PATCH] s390: add __raw_writeq required by __iowrite64_copy 2006-06-20 19:55:53 -07:00
asm-sh
asm-sh64
asm-sparc Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 2006-06-20 17:39:28 -07:00
asm-sparc64 [PATCH] vgacon: make VGA_MAP_MEM take size, remove extra use 2006-06-22 15:05:58 -07:00
asm-um Merge git://git.infradead.org/hdrcleanup-2.6 2006-06-20 15:10:08 -07:00
asm-v850
asm-x86_64 [PATCH] sys_move_pages: x86_64 support 2006-06-23 07:42:53 -07:00
asm-xtensa [PATCH] vgacon: make VGA_MAP_MEM take size, remove extra use 2006-06-22 15:05:58 -07:00
keys
linux [PATCH] x86: cache pollution aware __copy_from_user_ll() 2006-06-23 07:42:56 -07:00
math-emu
media
mtd Merge git://git.infradead.org/hdrcleanup-2.6 2006-06-20 15:10:08 -07:00
net Merge branch 'master' into upstream 2006-06-22 22:51:46 -04:00
pcmcia
rdma IB/uverbs: Don't serialize with ib_uverbs_idr_mutex 2006-06-17 20:44:49 -07:00
rxrpc
scsi Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 2006-06-21 11:18:25 -07:00
sound [ALSA] version 1.0.12rc1 2006-06-22 21:35:11 +02:00
video