linux_dsm_epyc7002/include
Christoph Lameter 8ff12cfc00 SLUB: Support for performance statistics
The statistics provided here allow the monitoring of allocator behavior but
at the cost of some (minimal) loss of performance. Counters are placed in
SLUB's per cpu data structure. The per cpu structure may be extended by the
statistics to grow larger than one cacheline which will increase the cache
footprint of SLUB.

There is a compile option to enable/disable the inclusion of the runtime
statistics and its off by default.

The slabinfo tool is enhanced to support these statistics via two options:

-D 	Switches the line of information displayed for a slab from size
	mode to activity mode.

-A	Sorts the slabs displayed by activity. This allows the display of
	the slabs most important to the performance of a certain load.

-r	Report option will report detailed statistics on

Example (tbench load):

slabinfo -AD		->Shows the most active slabs

Name                   Objects    Alloc     Free   %Fast
skbuff_fclone_cache         33 111953835 111953835  99  99
:0000192                  2666  5283688  5281047  99  99
:0001024                   849  5247230  5246389  83  83
vm_area_struct            1349   119642   118355  91  22
:0004096                    15    66753    66751  98  98
:0000064                  2067    25297    23383  98  78
dentry                   10259    28635    18464  91  45
:0000080                 11004    18950     8089  98  98
:0000096                  1703    12358    10784  99  98
:0000128                   762    10582     9875  94  18
:0000512                   184     9807     9647  95  81
:0002048                   479     9669     9195  83  65
anon_vma                   777     9461     9002  99  71
kmalloc-8                 6492     9981     5624  99  97
:0000768                   258     7174     6931  58  15

So the skbuff_fclone_cache is of highest importance for the tbench load.
Pretty high load on the 192 sized slab. Look for the aliases

slabinfo -a | grep 000192
:0000192     <- xfs_btree_cur filp kmalloc-192 uid_cache tw_sock_TCP
	request_sock_TCPv6 tw_sock_TCPv6 skbuff_head_cache xfs_ili

Likely skbuff_head_cache.


Looking into the statistics of the skbuff_fclone_cache is possible through

slabinfo skbuff_fclone_cache	->-r option implied if cache name is mentioned


.... Usual output ...

Slab Perf Counter       Alloc     Free %Al %Fr
--------------------------------------------------
Fastpath             111953360 111946981  99  99
Slowpath                 1044     7423   0   0
Page Alloc                272      264   0   0
Add partial                25      325   0   0
Remove partial             86      264   0   0
RemoteObj/SlabFrozen      350     4832   0   0
Total                111954404 111954404

Flushes       49 Refill        0
Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%)

Looks good because the fastpath is overwhelmingly taken.


skbuff_head_cache:

Slab Perf Counter       Alloc     Free %Al %Fr
--------------------------------------------------
Fastpath              5297262  5259882  99  99
Slowpath                 4477    39586   0   0
Page Alloc                937      824   0   0
Add partial                 0     2515   0   0
Remove partial           1691      824   0   0
RemoteObj/SlabFrozen     2621     9684   0   0
Total                 5301739  5299468

Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%)


Descriptions of the output:

Total:		The total number of allocation and frees that occurred for a
		slab

Fastpath:	The number of allocations/frees that used the fastpath.

Slowpath:	Other allocations

Page Alloc:	Number of calls to the page allocator as a result of slowpath
		processing

Add Partial:	Number of slabs added to the partial list through free or
		alloc (occurs during cpuslab flushes)

Remove Partial:	Number of slabs removed from the partial list as a result of
		allocations retrieving a partial slab or by a free freeing
		the last object of a slab.

RemoteObj/Froz:	How many times were remotely freed object encountered when a
		slab was about to be deactivated. Frozen: How many times was
		free able to skip list processing because the slab was in use
		as the cpuslab of another processor.

Flushes:	Number of times the cpuslab was flushed on request
		(kmem_cache_shrink, may result from races in __slab_alloc)

Refill:		Number of times we were able to refill the cpuslab from
		remotely freed objects for the same slab.

Deactivate:	Statistics how slabs were deactivated. Shows how they were
		put onto the partial list.

In general fastpath is very good. Slowpath without partial list processing is
also desirable. Any touching of partial list uses node specific locks which
may potentially cause list lock contention.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
2008-02-07 17:47:41 -08:00
..
acpi Merge branches 'release' and 'fluff' into release 2008-02-07 03:38:22 -05:00
asm-alpha Add cmpxchg64 and cmpxchg64_local to alpha 2008-02-07 08:42:30 -08:00
asm-arm Add cmpxchg_local to arm 2008-02-07 08:42:31 -08:00
asm-avr32 Add cmpxchg_local to avr32 2008-02-07 08:42:31 -08:00
asm-blackfin Add cmpxchg_local to blackfin, replace __cmpxchg by generic cmpxchg 2008-02-07 08:42:31 -08:00
asm-cris Add cmpxchg_local to cris 2008-02-07 08:42:31 -08:00
asm-frv Add cmpxchg_local to frv 2008-02-07 08:42:32 -08:00
asm-generic Add cmpxchg_local to asm-generic for per cpu atomic operations 2008-02-07 08:42:30 -08:00
asm-h8300 Add cmpxchg_local to h8300 2008-02-07 08:42:32 -08:00
asm-ia64 Add cmpxchg_local, cmpxchg64 and cmpxchg64_local to ia64 2008-02-07 08:42:32 -08:00
asm-m32r local_t m32r use architecture specific cmpxchg_local 2008-02-07 08:42:32 -08:00
asm-m68k m68k: kill page walker compile warning 2008-02-07 09:10:06 -08:00
asm-m68knommu Add cmpxchg_local to m68knommu 2008-02-07 08:42:32 -08:00
asm-mips Add cmpxchg64 and cmpxchg64_local to mips 2008-02-07 08:42:30 -08:00
asm-parisc Add cmpxchg_local to parisc 2008-02-07 08:42:32 -08:00
asm-powerpc Merge branch 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc 2008-02-07 09:02:26 -08:00
asm-ppc Add cmpxchg_local to ppc 2008-02-07 08:42:32 -08:00
asm-s390 Add cmpxchg_local to s390 2008-02-07 08:42:32 -08:00
asm-sh Sanitize the type of struct user.u_ar0 2008-02-07 08:42:30 -08:00
asm-sparc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 2008-02-07 10:21:26 -08:00
asm-sparc64 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 2008-02-07 10:21:26 -08:00
asm-um uml: LDT mutex conversion 2008-02-05 09:44:31 -08:00
asm-v850 Add cmpxchg_local to v850 2008-02-07 08:42:33 -08:00
asm-x86 Add cmpxchg64 and cmpxchg64_local to x86_64 2008-02-07 08:42:31 -08:00
asm-xtensa Add cmpxchg_local to xtensa 2008-02-07 08:42:33 -08:00
crypto
keys
linux SLUB: Support for performance statistics 2008-02-07 17:47:41 -08:00
math-emu
media include/media/: Spelling fixes 2008-02-03 17:19:47 +02:00
mtd Merge git://git.infradead.org/~dedekind/ubi-2.6 2008-02-03 22:07:40 +11:00
net 9p: add support for sticky bit 2008-02-06 19:25:06 -06:00
pcmcia pcmcia: replace kio_addr_t with unsigned int everywhere 2008-02-05 09:44:08 -08:00
rdma
rxrpc
scsi include/scsi/: Spelling fixes 2008-02-03 17:47:00 +02:00
sound [ALSA] version 1.0.16rc2 2008-01-31 17:40:18 +01:00
video atmel_lcdfb: backlight control 2008-02-06 10:41:16 -08:00
xen x86: page.h: make pte_t a union to always include 2008-01-30 13:32:57 +01:00
Kbuild