linux_dsm_epyc7002/include
Rasmus Villemoes 94a58c360a slab.h: sprinkle __assume_aligned attributes
The various allocators return aligned memory.  Telling the compiler that
allows it to generate better code in many cases, for example when the
return value is immediately passed to memset().

Some code does become larger, but at least we win twice as much as we lose:

$ scripts/bloat-o-meter /tmp/vmlinux vmlinux
add/remove: 0/0 grow/shrink: 13/52 up/down: 995/-2140 (-1145)

An example of the different (and smaller) code can be seen in mm_alloc(). Before:

:       48 8d 78 08             lea    0x8(%rax),%rdi
:       48 89 c1                mov    %rax,%rcx
:       48 89 c2                mov    %rax,%rdx
:       48 c7 00 00 00 00 00    movq   $0x0,(%rax)
:       48 c7 80 48 03 00 00    movq   $0x0,0x348(%rax)
:       00 00 00 00
:       31 c0                   xor    %eax,%eax
:       48 83 e7 f8             and    $0xfffffffffffffff8,%rdi
:       48 29 f9                sub    %rdi,%rcx
:       81 c1 50 03 00 00       add    $0x350,%ecx
:       c1 e9 03                shr    $0x3,%ecx
:       f3 48 ab                rep stos %rax,%es:(%rdi)

After:

:       48 89 c2                mov    %rax,%rdx
:       b9 6a 00 00 00          mov    $0x6a,%ecx
:       31 c0                   xor    %eax,%eax
:       48 89 d7                mov    %rdx,%rdi
:       f3 48 ab                rep stos %rax,%es:(%rdi)

So gcc's strategy is to do two possibly (but not really, of course)
unaligned stores to the first and last word, then do an aligned rep stos
covering the middle part with a little overlap.  Maybe arches which do not
allow unaligned stores gain even more.

I don't know if gcc can actually make use of alignments greater than 8 for
anything, so one could probably drop the __assume_xyz_alignment macros and
just use __assume_aligned(8).

The increases in code size are mostly caused by gcc deciding to
opencode strlen() using the check-four-bytes-at-a-time trick when it
knows the buffer is sufficiently aligned (one function grew by 200
bytes). Now it turns out that many of these strlen() calls showing up
were in fact redundant, and they're gone from -next. Applying the two
patches to next-20151001 bloat-o-meter instead says

add/remove: 0/0 grow/shrink: 6/52 up/down: 244/-2140 (-1896)

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-20 16:17:32 -08:00
..
acpi Merge branch 'acpi-pci' 2015-11-07 01:30:10 +01:00
asm-generic h8300 update for v4.4 2015-11-12 15:26:39 -08:00
clocksource
crypto Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2015-11-05 15:32:38 -08:00
drm drm/atomic: add a drm_atomic_clean_old_fb helper. 2015-11-17 13:02:14 +02:00
dt-bindings ARM: DT updates for v4.4 2015-11-10 15:06:26 -08:00
keys KEYS: Merge the type-specific data with the payload data 2015-10-21 15:18:36 +01:00
kvm s390: A bunch of fixes and optimizations for interrupt and time 2015-11-05 16:26:26 -08:00
linux slab.h: sprinkle __assume_aligned attributes 2015-11-20 16:17:32 -08:00
math-emu
media [media] v4l2: add support for SDR transmitter 2015-10-20 15:40:50 -02:00
memory
misc
net net: switchdev: fix return code of fdb_dump stub 2015-11-16 15:24:37 -05:00
pcmcia
ras
rdma IB/core, cma: Make __attribute_const__ declarations sparse-friendly 2015-10-30 17:57:49 -04:00
rxrpc
scsi scsi: use host wide tags by default 2015-11-09 17:11:57 -08:00
soc ARM: SoC driver updates for v4.4 2015-11-10 15:00:03 -08:00
sound ALSA: Constify ratden/ratnum constraints 2015-10-28 11:42:22 +01:00
target Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2015-11-13 20:04:17 -08:00
trace Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux 2015-11-11 09:03:01 -08:00
uapi VFIO updates for v4.4-rc1 2015-11-13 17:05:32 -08:00
video drm/exynos/decon5433: add support for DECON-TV 2015-11-03 11:46:37 +09:00
xen xen/grant-table: Add an helper to iterate over a specific number of grants 2015-10-23 14:20:46 +01:00
Kbuild