linux_dsm_epyc7002/include
Yang Shi 1eb6234e52 mm: swap: make page_evictable() inline
When backporting commit 9c4e6b1a70 ("mm, mlock, vmscan: no more skipping
pagevecs") to our 4.9 kernel, our test bench noticed around 10% down with
a couple of vm-scalability's test cases (lru-file-readonce,
lru-file-readtwice and lru-file-mmap-read).  I didn't see that much down
on my VM (32c-64g-2nodes).  It might be caused by the test configuration,
which is 32c-256g with NUMA disabled and the tests were run in root memcg,
so the tests actually stress only one inactive and active lru.  It sounds
not very usual in mordern production environment.

That commit did two major changes:
1. Call page_evictable()
2. Use smp_mb to force the PG_lru set visible

It looks they contribute the most overhead.  The page_evictable() is a
function which does function prologue and epilogue, and that was used by
page reclaim path only.  However, lru add is a very hot path, so it sounds
better to make it inline.  However, it calls page_mapping() which is not
inlined either, but the disassemble shows it doesn't do push and pop
operations and it sounds not very straightforward to inline it.

Other than this, it sounds smp_mb() is not necessary for x86 since
SetPageLRU is atomic which enforces memory barrier already, replace it
with smp_mb__after_atomic() in the following patch.

With the two fixes applied, the tests can get back around 5% on that test
bench and get back normal on my VM.  Since the test bench configuration is
not that usual and I also saw around 6% up on the latest upstream, so it
sounds good enough IMHO.

The below is test data (lru-file-readtwice throughput) against the v5.6-rc4:
	mainline	w/ inline fix
          150MB            154MB

With this patch the throughput gets 2.67% up.  The data with using
smp_mb__after_atomic() is showed in the following patch.

Shakeel Butt did the below test:

On a real machine with limiting the 'dd' on a single node and reading 100
GiB sparse file (less than a single node).  Just ran a single instance to
not cause the lru lock contention.  The cmdline used is "dd if=file-100GiB
of=/dev/null bs=4k".  Ran the cmd 10 times with drop_caches in between and
measured the time it took.

Without patch: 56.64143 +- 0.672 sec

With patches: 56.10 +- 0.21 sec

[akpm@linux-foundation.org: move page_evictable() to internal.h]
Fixes: 9c4e6b1a70 ("mm, mlock, vmscan: no more skipping pagevecs")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: http://lkml.kernel.org/r/1584500541-46817-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:27 -07:00
..
acpi Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2020-03-30 16:17:15 -07:00
asm-generic asm-generic: make more kernel-space headers mandatory 2020-04-02 09:35:25 -07:00
clocksource
crypto
drm
dt-bindings media updates for v5.7-rc1 2020-03-30 13:42:05 -07:00
keys
kunit
kvm irqchip/gic-v4.1: Move doorbell management to the GICv4 abstraction layer 2020-03-24 12:15:51 +00:00
linux mm: swap: make page_evictable() inline 2020-04-02 09:35:27 -07:00
math-emu
media media: cec-notifier: make cec_notifier_get_conn() static 2020-03-20 09:02:45 +01:00
misc
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-03-31 17:29:33 -07:00
pcmcia
ras
rdma
scsi scsi: simplify scsi_partsize 2020-03-24 07:57:07 -06:00
soc net: dsa: felix: add port policers 2020-03-30 11:44:00 -07:00
sound
target
trace Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-03-31 17:29:33 -07:00
uapi Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-03-31 17:29:33 -07:00
vdso vdso: Fix clocksource.h macro detection 2020-03-23 18:51:08 +01:00
video
xen