linux_dsm_epyc7002/arch/powerpc/mm
Srikar Dronamraju 67df77845c powerpc/numa: Restrict possible nodes based on platform
As per draft LoPAPR (Revision 2.9_pre7), section B.5.3 "Run Time
Abstraction Services (RTAS) Node" available at:
  https://openpowerfoundation.org/wp-content/uploads/2020/07/LoPAR-20200611.pdf

... there are 2 device tree properties:

  "ibm,max-associativity-domains"
   which defines the maximum number of domains that the firmware i.e
   PowerVM can support.

and:

  "ibm,current-associativity-domains"
   which defines the maximum number of domains that the current
   platform can support.

The value of "ibm,max-associativity-domains" is always greater than or
equal to "ibm,current-associativity-domains" property. If the latter
property is not available, use "ibm,max-associativity-domain" as a
fallback. In this yet to be released LoPAPR, "ibm,current-associativity-domains"
is mentioned in page 833 / B.5.3 which is covered under under
"Appendix B. System Binding" section

Currently powerpc uses the "ibm,max-associativity-domains" property
while setting the possible number of nodes. This is currently set at
32. However the possible number of nodes for a platform may be
significantly less. Hence set the possible number of nodes based on
"ibm,current-associativity-domains" property.

Nathan Lynch had raised a valid concern that post LPM (Live Partition
Migration), a user could DLPAR add processors and memory after LPM
with "new" associativity properties:
  https://lore.kernel.org/linuxppc-dev/871rljfet9.fsf@linux.ibm.com/t/#u

He also pointed out that "ibm,max-associativity-domains" has the same
contents on all currently available PowerVM systems, unlike
"ibm,current-associativity-domains" and hence may be better able to
handle the new NUMA associativity properties.

However with the recent commit dbce456280 ("powerpc/numa: Limit
possible nodes to within num_possible_nodes"), all new NUMA
associativity properties are capped to initially set nr_node_ids.
Hence this commit should be safe with any new DLPAR add post LPM.

  $ lsprop /proc/device-tree/rtas/ibm,*associ*-domains
  /proc/device-tree/rtas/ibm,current-associativity-domains
  		 00000005 00000001 00000002 00000002 00000002 00000010
  /proc/device-tree/rtas/ibm,max-associativity-domains
  		 00000005 00000001 00000008 00000020 00000020 00000100

  $ cat /sys/devices/system/node/possible ##Before patch
  0-31

  $ cat /sys/devices/system/node/possible ##After patch
  0-1

Note the maximum nodes this platform can support is only 2 but the
possible nodes is set to 32.

This is important because lot of kernel and user space code allocate
structures for all possible nodes leading to a lot of memory that is
allocated but not used.

I ran a simple experiment to create and destroy 100 memory cgroups on
boot on a 8 node machine (Power8 Alpine).

Before patch:
  free -k at boot
                total        used        free      shared  buff/cache   available
  Mem:      523498176     4106816   518820608       22272      570752   516606720
  Swap:       4194240           0     4194240

  free -k after creating 100 memory cgroups
                total        used        free      shared  buff/cache   available
  Mem:      523498176     4628416   518246464       22336      623296   516058688
  Swap:       4194240           0     4194240

  free -k after destroying 100 memory cgroups
                total        used        free      shared  buff/cache   available
  Mem:      523498176     4697408   518173760       22400      627008   515987904
  Swap:       4194240           0     4194240

After patch:
  free -k at boot
                total        used        free      shared  buff/cache   available
  Mem:      523498176     3969472   518933888       22272      594816   516731776
  Swap:       4194240           0     4194240

  free -k after creating 100 memory cgroups
                total        used        free      shared  buff/cache   available
  Mem:      523498176     4181888   518676096       22208      640192   516496448
  Swap:       4194240           0     4194240

  free -k after destroying 100 memory cgroups
                total        used        free      shared  buff/cache   available
  Mem:      523498176     4232320   518619904       22272      645952   516443264
  Swap:       4194240           0     4194240

Observations:
  Fixed kernel takes 137344 kb (4106816-3969472) less to boot.
  Fixed kernel takes 309184 kb (4628416-4181888-137344) less to create 100 memcgs.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
[mpe: Reformat change log a bit for readability]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200817055257.110873-1-srikar@linux.vnet.ibm.com
2020-09-16 22:05:19 +10:00
..
book3s32 powerpc/32s: Fix module loading failure when VMALLOC_END is over 0xf0000000 2020-08-21 23:30:25 +10:00
book3s64 powerepc/book3s64/hash: Align start/end address correctly with bolt mapping 2020-09-15 22:13:38 +10:00
kasan powerpc/kasan: Fix CONFIG_KASAN_VMALLOC for 8xx 2020-09-15 22:13:37 +10:00
nohash powerpc/8xx: Support 16k hugepages with 4k pages 2020-09-15 22:13:31 +10:00
ptdump powerpc/8xx: Support 16k hugepages with 4k pages 2020-09-15 22:13:31 +10:00
copro_fault.c mm: clean up the last pieces of page fault accountings 2020-08-12 10:58:04 -07:00
dma-noncoherent.c dma-mapping: drop the dev argument to arch_sync_dma_for_* 2019-11-20 20:31:38 +01:00
drmem.c pseries/drmem: don't cache node id in drmem_lmb struct 2020-09-02 11:00:21 +10:00
fault.c mm/powerpc: use general page fault accounting 2020-08-12 10:58:03 -07:00
highmem.c arch/kunmap_atomic: consolidate duplicate code 2020-06-04 19:06:22 -07:00
hugetlbpage.c powerpc/8xx: Support 16k hugepages with 4k pages 2020-09-15 22:13:31 +10:00
init_32.c Merge branch 'akpm' (patches from Andrew) 2020-08-07 11:39:33 -07:00
init_64.c Merge branch 'fixes' into next 2020-09-14 22:57:18 +10:00
init-common.c mm: reorder includes after introduction of linux/pgtable.h 2020-06-09 09:39:13 -07:00
ioremap_32.c powerpc/ioremap: warn on early use of ioremap() 2019-11-19 19:38:38 +11:00
ioremap_64.c powerpc: remove __ioremap_at and __iounmap_at 2020-06-02 10:59:10 -07:00
ioremap.c mm/memremap_pages: Introduce memremap_compat_align() 2020-02-20 16:58:55 -08:00
Makefile powerpc/mm: Move ioremap functions out of pgtable_32/64.c 2019-08-27 13:03:35 +10:00
mem.c powerpc/pseries/svm: Allocate SWIOTLB buffer anywhere in memory 2020-09-14 23:07:14 +10:00
mmap.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
mmu_context.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
mmu_decl.h powerpc/8xx: Don't set IMMR map anymore at boot 2020-05-26 22:22:21 +10:00
numa.c powerpc/numa: Restrict possible nodes based on platform 2020-09-16 22:05:19 +10:00
pgtable_32.c mm: pgtable: add shortcuts for accessing kernel PMD and PTE 2020-06-09 09:39:13 -07:00
pgtable_64.c mm: remove unneeded includes of <asm/pgalloc.h> 2020-08-07 11:33:26 -07:00
pgtable-frag.c powerpc/mm/radix: Fix PTE/PMD fragment count for early page table mappings 2020-07-20 22:57:56 +10:00
pgtable.c powerpc/8xx: Refactor calculation of number of entries per PTE in page tables 2020-09-15 22:13:31 +10:00
slice.c powerpc: Replace _ALIGN_UP() by ALIGN() 2020-05-11 23:15:15 +10:00