linux_dsm_epyc7002/include
Vlastimil Babka 53853e2d2b mm, compaction: defer each zone individually instead of preferred zone
When direct sync compaction is often unsuccessful, it may become deferred
for some time to avoid further useless attempts, both sync and async.
Successful high-order allocations un-defer compaction, while further
unsuccessful compaction attempts prolong the compaction deferred period.

Currently the checking and setting deferred status is performed only on
the preferred zone of the allocation that invoked direct compaction.  But
compaction itself is attempted on all eligible zones in the zonelist, so
the behavior is suboptimal and may lead both to scenarios where 1)
compaction is attempted uselessly, or 2) where it's not attempted despite
good chances of succeeding, as shown on the examples below:

1) A direct compaction with Normal preferred zone failed and set
   deferred compaction for the Normal zone.  Another unrelated direct
   compaction with DMA32 as preferred zone will attempt to compact DMA32
   zone even though the first compaction attempt also included DMA32 zone.

   In another scenario, compaction with Normal preferred zone failed to
   compact Normal zone, but succeeded in the DMA32 zone, so it will not
   defer compaction.  In the next attempt, it will try Normal zone which
   will fail again, instead of skipping Normal zone and trying DMA32
   directly.

2) Kswapd will balance DMA32 zone and reset defer status based on
   watermarks looking good.  A direct compaction with preferred Normal
   zone will skip compaction of all zones including DMA32 because Normal
   was still deferred.  The allocation might have succeeded in DMA32, but
   won't.

This patch makes compaction deferring work on individual zone basis
instead of preferred zone.  For each zone, it checks compaction_deferred()
to decide if the zone should be skipped.  If watermarks fail after
compacting the zone, defer_compaction() is called.  The zone where
watermarks passed can still be deferred when the allocation attempt is
unsuccessful.  When allocation is successful, compaction_defer_reset() is
called for the zone containing the allocated page.  This approach should
approximate calling defer_compaction() only on zones where compaction was
attempted and did not yield allocated page.  There might be corner cases
but that is inevitable as long as the decision to stop compacting dues not
guarantee that a page will be allocated.

Due to a new COMPACT_DEFERRED return value, some functions relying
implicitly on COMPACT_SKIPPED = 0 had to be updated, with comments made
more accurate.  The did_some_progress output parameter of
__alloc_pages_direct_compact() is removed completely, as the caller
actually does not use it after compaction sets it - it is only considered
when direct reclaim sets it.

During testing on a two-node machine with a single very small Normal zone
on node 1, this patch has improved success rates in stress-highalloc
mmtests benchmark.  The success here were previously made worse by commit
3a025760fc ("mm: page_alloc: spill to remote nodes before waking
kswapd") as kswapd was no longer resetting often enough the deferred
compaction for the Normal zone, and DMA32 zones on both nodes were thus
not considered for compaction.  On different machine, success rates were
improved with __GFP_NO_KSWAPD allocations.

[akpm@linux-foundation.org: fix CONFIG_COMPACTION=n build]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:25:53 -04:00
..
acpi Merge branches 'acpi-hotplug', 'acpi-scan', 'acpi-lpss', 'acpi-gpio' and 'acpi-video' 2014-09-25 22:59:30 +02:00
asm-generic common: dma-mapping: introduce common remapping functions 2014-10-09 22:25:52 -04:00
clocksource
crypto crypto: drbg - backport "fix maximum value checks on 32 bit systems" 2014-09-05 15:52:28 +08:00
drm drm/radeon: add additional SI pci ids 2014-08-22 10:47:59 -04:00
dt-bindings ARM: SoC DT updates for 3.18 2014-10-08 17:22:23 -04:00
keys
kvm arm/arm64: KVM: vgic: delay vgic allocation until init time 2014-09-18 18:48:58 -07:00
linux mm, compaction: defer each zone individually instead of preferred zone 2014-10-09 22:25:53 -04:00
math-emu
media [media] vb2: fix VBI/poll regression 2014-09-21 20:57:30 -03:00
memory
misc
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-10-08 21:40:54 -04:00
pcmcia
ras
rdma IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get 2014-09-19 09:55:42 -07:00
rxrpc include/rxrpc/types.h: Remove unused header 2014-08-29 20:33:39 -07:00
scsi Merge remote-tracking branch 'scsi-queue/drivers-for-3.18' into for-linus 2014-10-07 13:48:12 -07:00
soc/tegra
sound ASoC: core: fix .info for SND_SOC_BYTES_TLV 2014-08-18 08:59:12 -05:00
target
trace Merge tag 'f2fs-for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs 2014-10-08 12:53:15 -04:00
uapi Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-10-08 21:40:54 -04:00
video gpu: ipu-v3: Add ipu-cpmem unit 2014-08-18 14:17:41 +02:00
xen xen/arm: introduce XENFEAT_grant_map_identity 2014-09-11 18:11:52 +00:00
Kbuild