linux_dsm_epyc7002/Documentation/core-api
Vlastimil Babka 59bb47985c mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
In most configurations, kmalloc() happens to return naturally aligned
(i.e.  aligned to the block size itself) blocks for power of two sizes.

That means some kmalloc() users might unknowingly rely on that
alignment, until stuff breaks when the kernel is built with e.g.
CONFIG_SLUB_DEBUG or CONFIG_SLOB, and blocks stop being aligned.  Then
developers have to devise workaround such as own kmem caches with
specified alignment [1], which is not always practical, as recently
evidenced in [2].

The topic has been discussed at LSF/MM 2019 [3].  Adding a
'kmalloc_aligned()' variant would not help with code unknowingly relying
on the implicit alignment.  For slab implementations it would either
require creating more kmalloc caches, or allocate a larger size and only
give back part of it.  That would be wasteful, especially with a generic
alignment parameter (in contrast with a fixed alignment to size).

Ideally we should provide to mm users what they need without difficult
workarounds or own reimplementations, so let's make the kmalloc()
alignment to size explicitly guaranteed for power-of-two sizes under all
configurations.  What this means for the three available allocators?

* SLAB object layout happens to be mostly unchanged by the patch.  The
  implicitly provided alignment could be compromised with
  CONFIG_DEBUG_SLAB due to redzoning, however SLAB disables redzoning for
  caches with alignment larger than unsigned long long.  Practically on at
  least x86 this includes kmalloc caches as they use cache line alignment,
  which is larger than that.  Still, this patch ensures alignment on all
  arches and cache sizes.

* SLUB layout is also unchanged unless redzoning is enabled through
  CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache.
  With this patch, explicit alignment is guaranteed with redzoning as
  well.  This will result in more memory being wasted, but that should be
  acceptable in a debugging scenario.

* SLOB has no implicit alignment so this patch adds it explicitly for
  kmalloc().  The potential downside is increased fragmentation.  While
  pathological allocation scenarios are certainly possible, in my testing,
  after booting a x86_64 kernel+userspace with virtme, around 16MB memory
  was consumed by slab pages both before and after the patch, with
  difference in the noise.

[1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
[2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/
[3] https://lwn.net/Articles/787740/

[akpm@linux-foundation.org: documentation fixlet, per Matthew]
Link: http://lkml.kernel.org/r/20190826111627.7505-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: David Sterba <dsterba@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: "Darrick J . Wong" <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-07 15:47:20 -07:00
..
assoc_array.rst Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
atomic_ops.rst locking/atomics/Documentation: Describe atomic_set() as a write operation 2018-07-17 09:30:31 +02:00
boot-time-mm.rst docs/boot-time-mm: remove bootmem documentation 2018-10-31 08:54:16 -07:00
cachetlb.rst ia64/tlb: Eradicate tlb_migrate_finish() callback 2019-04-03 10:33:04 +02:00
circular-buffers.rst doc: Remove ".vnet" from paulmck email addresses 2019-05-28 09:02:57 -07:00
cpu_hotplug.rst Documentation: Update CPU hotplug and move it to core-api 2017-01-13 10:32:32 -07:00
debug-objects.rst doc: debugobjects: actually pull in the kerneldoc comments 2016-11-29 14:44:14 -07:00
errseq.rst errseq: Add to documentation tree 2018-01-01 12:40:27 -07:00
gcc-plugins.rst docs: move gcc_plugins.txt to core-api and rename to .rst 2019-07-15 09:20:27 -03:00
genalloc.rst doc: Add documentation for the genalloc subsystem 2017-08-30 16:49:04 -06:00
generic-radix-tree.rst generic radix trees 2019-03-12 10:04:02 -07:00
genericirq.rst genericirq.rst: Remove :c:func:... in code blocks 2017-12-02 08:41:46 -07:00
gfp_mask-from-fs-io.rst docs: core-api/gfp_mask-from-fs-io: add a label for cross-referencing 2018-09-20 11:02:32 -06:00
idr.rst idr: Change documentation license 2018-10-15 16:31:29 -04:00
index.rst docs: index.rst: don't use genindex for pdf output 2019-07-31 13:31:16 -06:00
kernel-api.rst kernel-doc: core-api: include string.h into core-api 2019-09-25 17:51:39 -07:00
librs.rst docs-rst: convert librs book to ReST 2017-05-16 08:44:16 -03:00
local_ops.rst timer: Remove init_timer() interface 2017-11-21 15:57:09 -08:00
memory-allocation.rst mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) 2019-10-07 15:47:20 -07:00
memory-hotplug.rst docs/core-api: memory-hotplug: add some details about locking internals 2018-10-12 11:14:19 -06:00
mm-api.rst docs/core-api/mm: fix GFP combinations section name 2019-01-14 17:35:39 -07:00
packing.rst docs: packing: move it to core-api book and adjust markups 2019-07-31 13:30:01 -06:00
printk-formats.rst docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi] 2019-09-14 01:57:43 -06:00
protection-keys.rst docs: move protection-keys.rst to the core-api book 2019-06-08 13:42:12 -06:00
refcount-vs-atomic.rst refcount_t: Add ACQUIRE ordering on success for dec(sub)_and_test() variants 2019-02-04 09:03:31 +01:00
timekeeping.rst It's been a relatively busy cycle for docs: 2019-07-09 12:34:26 -07:00
tracepoint.rst doc: Sphinxify the tracepoint docbook 2016-11-29 14:44:23 -07:00
workqueue.rst Documentation: core-api: minor workqueue.rst cleanups 2017-09-18 17:29:27 -07:00
xarray.rst docs: remove :c:func: annotations from xarray.rst 2019-06-26 11:14:15 -06:00