linux_dsm_epyc7002/arch/x86
Naoya Horiguchi 124049decb x86/e820: put !E820_TYPE_RAM regions into memblock.reserved
There is a kernel panic that is triggered when reading /proc/kpageflags
on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':

  BUG: unable to handle kernel paging request at fffffffffffffffe
  PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
  Oops: 0000 [#1] SMP PTI
  CPU: 2 PID: 1728 Comm: page-types Not tainted 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014
  RIP: 0010:stable_page_flags+0x27/0x3c0
  Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
  RSP: 0018:ffffbbd44111fde0 EFLAGS: 00010202
  RAX: fffffffffffffffe RBX: 00007fffffffeff9 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffffed1182fff5c0
  RBP: ffffffffffffffff R08: 0000000000000001 R09: 0000000000000001
  R10: ffffbbd44111fed8 R11: 0000000000000000 R12: ffffed1182fff5c0
  R13: 00000000000bffd7 R14: 0000000002fff5c0 R15: ffffbbd44111ff10
  FS:  00007efc4335a500(0000) GS:ffff93a5bfc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: fffffffffffffffe CR3: 00000000b2a58000 CR4: 00000000001406e0
  Call Trace:
   kpageflags_read+0xc7/0x120
   proc_reg_read+0x3c/0x60
   __vfs_read+0x36/0x170
   vfs_read+0x89/0x130
   ksys_pread64+0x71/0x90
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7efc42e75e23
  Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24

According to kernel bisection, this problem became visible due to commit
f7f99100d8 ("mm: stop zeroing memory during allocation in vmemmap")
which changes how struct pages are initialized.

Memblock layout affects the pfn ranges covered by node/zone.  Consider
that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
the default (no memmap= given) memblock layout is like below:

  MEMBLOCK configuration:
   memory size = 0x00000001fff75c00 reserved size = 0x000000000300c000
   memory.cnt  = 0x4
   memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
   memory[0x1]     [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
   memory[0x2]     [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 0 flags: 0x0
   memory[0x3]     [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
   ...

If you give memmap=1G!4G (so it just covers memory[0x2]),
the range [0x100000000-0x13fffffff] is gone:

  MEMBLOCK configuration:
   memory size = 0x00000001bff75c00 reserved size = 0x000000000300c000
   memory.cnt  = 0x3
   memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
   memory[0x1]     [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
   memory[0x2]     [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
   ...

This causes shrinking node 0's pfn range because it is calculated by the
address range of memblock.memory.  So some of struct pages in the gap
range are left uninitialized.

We have a function zero_resv_unavail() which does zeroing the struct pages
within the reserved unavailable range (i.e.  memblock.memory &&
!memblock.reserved).  This patch utilizes it to cover all unavailable
ranges by putting them into memblock.reserved.

Link: http://lkml.kernel.org/r/20180615072947.GB23273@hori1.linux.bs1.fc.nec.co.jp
Fixes: f7f99100d8 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Tested-by: Oscar Salvador <osalvador@suse.de>
Tested-by: "Herton R. Krzesinski" <herton@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-28 11:16:44 -07:00
..
boot efi/x86: Fix incorrect invocation of PciIo->Attributes() 2018-06-24 09:05:58 +02:00
configs x86/unwind: Rename unwinder config options to 'CONFIG_UNWINDER_*' 2017-10-14 10:12:12 +02:00
crypto crypto: x86/salsa20 - remove x86 salsa20 implementations 2018-05-31 00:13:57 +08:00
entry rseq: Avoid infinite recursion when delivering SIGSEGV 2018-06-22 19:04:22 +02:00
events treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
hyperv x86/hyper-v: move struct hv_flush_pcpu{,ex} definitions to common header 2018-05-26 14:14:33 +02:00
ia32 syscalls/x86: auto-create compat_sys_*() prototypes 2018-04-02 20:16:18 +02:00
include Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-06-24 19:48:30 +08:00
kernel x86/e820: put !E820_TYPE_RAM regions into memblock.reserved 2018-06-28 11:16:44 -07:00
kvm kvm: vmx: Nested VM-entry prereqs for event inj. 2018-06-22 16:46:26 +02:00
lib libnvdimm for 4.18 2018-06-08 17:21:52 -07:00
math-emu License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mm Merge branch 'linus' into x86/urgent 2018-06-22 21:20:35 +02:00
net treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
oprofile x86/oprofile: Fix bogus GCC-8 warning in nmi_setup() 2018-02-21 09:54:17 +01:00
pci treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
platform treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
power x86/mm: Stop pretending pgtable_l5_enabled is a variable 2018-05-19 11:56:57 +02:00
purgatory kernel/kexec_file.c: move purgatories sha256 to common code 2018-04-13 17:10:28 -07:00
ras License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
realmode x86-64/realmode: Add instruction suffix 2018-02-20 09:33:41 +01:00
tools x86: Treat R_X86_64_PLT32 as R_X86_64_PC32 2018-02-22 09:01:10 -08:00
um Kconfig updates for v4.18 2018-06-06 11:31:45 -07:00
video
xen Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-06-24 19:59:52 +08:00
.gitignore x86/build: Add arch/x86/tools/insn_decoder_test to .gitignore 2018-02-13 14:10:29 +01:00
Kbuild
Kconfig Kbuild: rename HAVE_CC_STACKPROTECTOR config variable 2018-06-15 07:15:28 +09:00
Kconfig.cpu Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-03-25 07:36:02 -10:00
Kconfig.debug x86, nfit_test: Add unit test for memcpy_mcsafe() 2018-05-22 23:18:31 -07:00
Makefile Merge branch 'linus' into x86/urgent 2018-06-22 21:20:35 +02:00
Makefile_32.cpu License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Makefile.um License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00