linux_dsm_epyc7002/arch
Baoquan He 98b57685c2 mm: memmap defer init doesn't work as expected
commit dc2da7b45ffe954a0090f5d0310ed7b0b37d2bd2 upstream.

VMware observed a performance regression during memmap init on their
platform, and bisected to commit 73a6e474cb ("mm: memmap_init:
iterate over memblock regions rather that check each PFN") causing it.

Before the commit:

  [0.033176] Normal zone: 1445888 pages used for memmap
  [0.033176] Normal zone: 89391104 pages, LIFO batch:63
  [0.035851] ACPI: PM-Timer IO Port: 0x448

With commit

  [0.026874] Normal zone: 1445888 pages used for memmap
  [0.026875] Normal zone: 89391104 pages, LIFO batch:63
  [2.028450] ACPI: PM-Timer IO Port: 0x448

The root cause is the current memmap defer init doesn't work as expected.

Before, memmap_init_zone() was used to do memmap init of one whole zone,
to initialize all low zones of one numa node, but defer memmap init of
the last zone in that numa node.  However, since commit 73a6e474cb,
function memmap_init() is adapted to iterater over memblock regions
inside one zone, then call memmap_init_zone() to do memmap init for each
region.

E.g, on VMware's system, the memory layout is as below, there are two
memory regions in node 2.  The current code will mistakenly initialize the
whole 1st region [mem 0xab00000000-0xfcffffffff], then do memmap defer to
iniatialize only one memmory section on the 2nd region [mem
0x10000000000-0x1033fffffff].  In fact, we only expect to see that there's
only one memory section's memmap initialized.  That's why more time is
costed at the time.

[    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
[    0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff]
[    0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff]
[    0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff]
[    0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff]

Now, let's add a parameter 'zone_end_pfn' to memmap_init_zone() to pass
down the real zone end pfn so that defer_init() can use it to judge
whether defer need be taken in zone wide.

Link: https://lkml.kernel.org/r/20201223080811.16211-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20201223080811.16211-2-bhe@redhat.com
Fixes: commit 73a6e474cb ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Baoquan He <bhe@redhat.com>
Reported-by: Rahul Gopakumar <gopakumarr@vmware.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-06 14:56:50 +01:00
..
alpha sched/idle: Fix arch_cpu_idle() vs tracing 2020-11-24 16:47:35 +01:00
arc asm-generic: add correct MAX_POSSIBLE_PHYSMEM_BITS setting 2020-11-27 15:00:35 -08:00
arm ARM: tegra: Populate OPP table for Tegra20 Ventana 2020-12-30 11:54:15 +01:00
arm64 KVM: arm64: Introduce handling of AArch32 TTBCR2 traps 2020-12-30 11:54:14 +01:00
c6x arch-cleanup-2020-10-22 2020-10-23 10:06:38 -07:00
csky Yet two more places which invoke tracing from RCU disabled regions in the 2020-11-29 11:19:26 -08:00
h8300 sched/idle: Fix arch_cpu_idle() vs tracing 2020-11-24 16:47:35 +01:00
hexagon sched/idle: Fix arch_cpu_idle() vs tracing 2020-11-24 16:47:35 +01:00
ia64 mm: memmap defer init doesn't work as expected 2021-01-06 14:56:50 +01:00
m68k m68k: Fix WARNING splat in pmac_zilog driver 2020-12-30 11:54:11 +01:00
microblaze sched/idle: Fix arch_cpu_idle() vs tracing 2020-11-24 16:47:35 +01:00
mips MIPS: Don't round up kernel sections size for memblock_add() 2020-12-30 11:53:34 +01:00
nds32 arch-cleanup-2020-10-22 2020-10-23 10:06:38 -07:00
nios2 sched/idle: Fix arch_cpu_idle() vs tracing 2020-11-24 16:47:35 +01:00
openrisc sched/idle: Fix arch_cpu_idle() vs tracing 2020-11-24 16:47:35 +01:00
parisc sched/idle: Fix arch_cpu_idle() vs tracing 2020-11-24 16:47:35 +01:00
powerpc powerpc/powernv/memtrace: Fix crashing the kernel when enabling concurrently 2020-12-30 11:54:16 +01:00
riscv RISC-V: Fix usage of memblock_enforce_memory_limit 2020-12-30 11:54:13 +01:00
s390 s390/idle: fix accounting with machine checks 2020-12-30 11:54:08 +01:00
sh sched/idle: Fix arch_cpu_idle() vs tracing 2020-11-24 16:47:35 +01:00
sparc sparc: fix handling of page table constructor failure 2020-12-30 11:53:55 +01:00
um um: Fix time-travel mode 2020-12-30 11:54:17 +01:00
x86 x86/CPU/AMD: Save AMD NodeId as cpu_die_id 2020-12-30 11:54:29 +01:00
xtensa xtensa: uaccess: Add missing __user to strncpy_from_user() prototype 2020-11-17 05:09:28 -08:00
.gitignore
Kconfig Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS" 2020-12-30 11:54:29 +01:00