linux_dsm_epyc7002/drivers/base
Jonathan Cameron a21558618c mm/memory_hotplug: fix leftover use of struct page during hotplug
The case of a new numa node got missed in avoiding using the node info
from page_struct during hotplug.  In this path we have a call to
register_mem_sect_under_node (which allows us to specify it is hotplug
so don't change the node), via link_mem_sections which unfortunately
does not.

Fix is to pass check_nid through link_mem_sections as well and disable
it in the new numa node path.

Note the bug only 'sometimes' manifests depending on what happens to be
in the struct page structures - there are lots of them and it only needs
to match one of them.

The result of the bug is that (with a new memory only node) we never
successfully call register_mem_sect_under_node so don't get the memory
associated with the node in sysfs and meminfo for the node doesn't
report it.

It came up whilst testing some arm64 hotplug patches, but appears to be
universal.  Whilst I'm triggering it by removing then reinserting memory
to a node with no other elements (thus making the node disappear then
appear again), it appears it would happen on hotplugging memory where
there was none before and it doesn't seem to be related the arm64
patches.

These patches call __add_pages (where most of the issue was fixed by
Pavel's patch).  If there is a node at the time of the __add_pages call
then all is well as it calls register_mem_sect_under_node from there
with check_nid set to false.  Without a node that function returns
having not done the sysfs related stuff as there is no node to use.
This is expected but it is the resulting path that fails...

Exact path to the problem is as follows:

 mm/memory_hotplug.c: add_memory_resource()

   The node is not online so we enter the 'if (new_node)' twice, on the
   second such block there is a call to link_mem_sections which calls
   into

  drivers/node.c: link_mem_sections() which calls

  drivers/node.c: register_mem_sect_under_node() which calls
     get_nid_for_pfn and keeps trying until the output of that matches
     the expected node (passed all the way down from
     add_memory_resource)

It is effectively the same fix as the one referred to in the fixes tag
just in the code path for a new node where the comments point out we
have to rerun the link creation because it will have failed in
register_new_memory (as there was no node at the time).  (actually that
comment is wrong now as we don't have register_new_memory any more it
got renamed to hotplug_memory_register in Pavel's patch).

Link: http://lkml.kernel.org/r/20180504085311.1240-1-Jonathan.Cameron@huawei.com
Fixes: fc44f7f923 ("mm/memory_hotplug: don't read nid from struct page during hotplug")
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-25 18:12:11 -07:00
..
firmware_loader firmware: some documentation fixes 2018-04-25 18:37:20 +02:00
power PM / core: Fix direct_complete handling for devices with no callbacks 2018-05-22 14:50:11 +02:00
regmap Merge remote-tracking branches 'regmap/topic/debugfs' and 'regmap/topic/mmio-clk' into regmap-next 2018-03-12 09:50:42 -07:00
test driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
arch_topology.c Revert "base: arch_topology: fix section mismatch build warnings" 2018-03-15 14:36:20 +01:00
attribute_container.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
base.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
bus.c drivers: base: omit redundant interations 2017-12-18 16:47:27 +01:00
cacheinfo.c Merge 4.15-rc6 into driver-core-next 2018-01-02 14:56:51 +01:00
class.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
component.c component: add debugfs support 2017-12-18 16:51:11 +01:00
container.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
core.c driver core: Introduce device links reference counting 2018-02-27 18:10:42 +01:00
cpu.c x86/bugs: Expose /sys/../spec_store_bypass 2018-05-03 13:55:47 +02:00
dd.c drivers: base: remove check for callback in coredump_store() 2018-03-23 18:08:02 +01:00
devcon.c drivers: base: Unified device connection lookup 2018-03-22 13:10:29 +01:00
devcoredump.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
devres.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
devtmpfs.c kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare() 2018-04-02 20:16:06 +02:00
dma-coherent.c dma-coherent: clarify dma_mmap_from_dev_coherent documentation 2018-04-23 14:44:17 +02:00
dma-contiguous.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
dma-mapping.c dma-mapping: postpone cpu addr translation on mmap 2018-04-23 14:44:24 +02:00
driver.c drivers: base: omit redundant interations 2017-12-18 16:47:27 +01:00
firmware.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
hypervisor.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
init.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
isa.c Merge 4.15-rc3 into driver-core-next 2017-12-11 08:50:05 +01:00
Kconfig drm/graphics pull request for v4.16-rc1 2018-02-01 17:48:47 -08:00
Makefile Driver core patches for 4.17-rc1 2018-04-04 19:41:45 -07:00
map.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
memory.c mm: check __highest_present_section_nr directly in memory_dev_init() 2018-04-11 10:28:31 -07:00
module.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
node.c mm/memory_hotplug: fix leftover use of struct page during hotplug 2018-05-25 18:12:11 -07:00
pinctrl.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
platform-msi.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
platform.c driver core: platform: use put_device() if device_register fail 2018-03-15 14:37:04 +01:00
property.c device property: Constify device_get_match_data() 2018-02-12 10:41:11 +01:00
soc.c base: soc: use put_device() instead of kfree() 2018-03-15 14:37:03 +01:00
syscore.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
topology.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
transport_class.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00