linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-11-26 16:00:54 +07:00

History

Daniel Colascione 493b0e9d94 mm: add /proc/pid/smaps_rollup /proc/pid/smaps_rollup is a new proc file that improves the performance of user programs that determine aggregate memory statistics (e.g., total PSS) of a process. Android regularly "samples" the memory usage of various processes in order to balance its memory pool sizes. This sampling process involves opening /proc/pid/smaps and summing certain fields. For very large processes, sampling memory use this way can take several hundred milliseconds, due mostly to the overhead of the seq_printf calls in task_mmu.c. smaps_rollup improves the situation. It contains most of the fields of /proc/pid/smaps, but instead of a set of fields for each VMA, smaps_rollup instead contains one synthetic smaps-format entry representing the whole process. In the single smaps_rollup synthetic entry, each field is the summation of the corresponding field in all of the real-smaps VMAs. Using a common format for smaps_rollup and smaps allows userspace parsers to repurpose parsers meant for use with non-rollup smaps for smaps_rollup, and it allows userspace to switch between smaps_rollup and smaps at runtime (say, based on the availability of smaps_rollup in a given kernel) with minimal fuss. By using smaps_rollup instead of smaps, a caller can avoid the significant overhead of formatting, reading, and parsing each of a large process's potentially very numerous memory mappings. For sampling system_server's PSS in Android, we measured a 12x speedup, representing a savings of several hundred milliseconds. One alternative to a new per-process proc file would have been including PSS information in /proc/pid/status. We considered this option but thought that PSS would be too expensive (by a few orders of magnitude) to collect relative to what's already emitted as part of /proc/pid/status, and slowing every user of /proc/pid/status for the sake of readers that happen to want PSS feels wrong. The code itself works by reusing the existing VMA-walking framework we use for regular smaps generation and keeping the mem_size_stats structure around between VMA walks instead of using a fresh one for each VMA. In this way, summation happens automatically. We let seq_file walk over the VMAs just as it does for regular smaps and just emit nothing to the seq_file until we hit the last VMA. Benchmarks: using smaps: iterations:1000 pid:1163 pss:220023808 0m29.46s real 0m08.28s user 0m20.98s system using smaps_rollup: iterations:1000 pid:1163 pss:220702720 0m04.39s real 0m00.03s user 0m04.31s system We're using the PSS samples we collect asynchronously for system-management tasks like fine-tuning oom_adj_score, memory use tracking for debugging, application-level memory-use attribution, and deciding whether we want to kill large processes during system idle maintenance windows. Android has been using PSS for these purposes for a long time; as the average process VMA count has increased and and devices become more efficiency-conscious, PSS-collection inefficiency has started to matter more. IMHO, it'd be a lot safer to optimize the existing PSS-collection model, which has been fine-tuned over the years, instead of changing the memory tracking approach entirely to work around smaps-generation inefficiency. Tim said: : There are two main reasons why Android gathers PSS information: : : 1. Android devices can show the user the amount of memory used per : application via the settings app. This is a less important use case. : : 2. We log PSS to help identify leaks in applications. We have found : an enormous number of bugs (in the Android platform, in Google's own : apps, and in third-party applications) using this data. : : To do this, system_server (the main process in Android userspace) will : sample the PSS of a process three seconds after it changes state (for : example, app is launched and becomes the foreground application) and about : every ten minutes after that. The net result is that PSS collection is : regularly running on at least one process in the system (usually a few : times a minute while the screen is on, less when screen is off due to : suspend). PSS of a process is an incredibly useful stat to track, and we : aren't going to get rid of it. We've looked at some very hacky approaches : using RSS ("take the RSS of the target process, subtract the RSS of the : zygote process that is the parent of all Android apps") to reduce the : accounting time, but it regularly overestimated the memory used by 20+ : percent. Accordingly, I don't think that there's a good alternative to : using PSS. : : We started looking into PSS collection performance after we noticed random : frequency spikes while a phone's screen was off; occasionally, one of the : CPU clusters would ramp to a high frequency because there was 200-300ms of : constant CPU work from a single thread in the main Android userspace : process. The work causing the spike (which is reasonable governor : behavior given the amount of CPU time needed) was always PSS collection. : As a result, Android is burning more power than we should be on PSS : collection. : : The other issue (and why I'm less sure about improving smaps as a : long-term solution) is that the number of VMAs per process has increased : significantly from release to release. After trying to figure out why we : were seeing these 200-300ms PSS collection times on Android O but had not : noticed it in previous versions, we found that the number of VMAs in the : main system process increased by 50% from Android N to Android O (from : ~1800 to ~2700) and varying increases in every userspace process. Android : M to N also had an increase in the number of VMAs, although not as much. : I'm not sure why this is increasing so much over time, but thinking about : ASLR and ways to make ASLR better, I expect that this will continue to : increase going forward. I would not be surprised if we hit 5000 VMAs on : the main Android process (system_server) by 2020. : : If we assume that the number of VMAs is going to increase over time, then : doing anything we can do to reduce the overhead of each VMA during PSS : collection seems like the right way to go, and that means outputting an : aggregate statistic (to avoid whatever overhead there is per line in : writing smaps and in reading each line from userspace). Link: http://lkml.kernel.org/r/20170812022148.178293-1-dancol@google.com Signed-off-by: Daniel Colascione <dancol@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sonny Rao <sonnyrao@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2017-09-06 17:27:30 -07:00
..
9p	fscache: remove unused ->now_uncached callback	2017-09-06 17:27:26 -07:00
adfs
affs	affs: Implement show_options	2017-07-11 06:06:17 -04:00
afs	fscache: remove unused ->now_uncached callback	2017-09-06 17:27:26 -07:00
autofs4	Fix up over-eager 'wait_queue_t' renaming	2017-07-10 11:40:19 -07:00
befs	Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-07-15 12:00:42 -07:00
bfs	bfs: fix sanity checks for empty files	2017-07-12 16:26:00 -07:00
btrfs	Btrfs: fix blk_status_t/errno confusion	2017-08-24 17:19:02 +02:00
cachefiles	sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming	2017-06-20 12:19:14 +02:00
ceph	fscache: remove unused ->now_uncached callback	2017-09-06 17:27:26 -07:00
cifs	fscache: remove unused ->now_uncached callback	2017-09-06 17:27:26 -07:00
coda	fs: implement vfs_iter_write using do_iter_write	2017-06-29 17:49:23 -04:00
configfs	configfs: Introduce config_item_get_unless_zero()	2017-06-12 13:20:20 +02:00
cramfs
crypto	The first major feature for ext4 this merge window is the largedir	2017-07-09 09:31:22 -07:00
debugfs	Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-07-15 12:00:42 -07:00
devpts	pty: Repair TIOCGPTPEER	2017-08-24 13:23:03 -07:00
dlm	net: Work around lockdep limitation in sockets that use sockets	2017-03-09 18:23:27 -08:00
ecryptfs	ecryptfs: Convert to separately allocated bdi	2017-04-20 12:09:55 -06:00
efivarfs	VFS: Kill off s_options and helpers	2017-07-11 06:09:21 -04:00
efs
exofs	mm: drop "wait" parameter from write_one_page()	2017-07-05 18:44:22 -04:00
exportfs	Merge branch 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-03-03 11:38:56 -08:00
ext2	dax: use common 4k zero page for dax mmap reads	2017-09-06 17:27:24 -07:00
ext4	mm: remove nr_pages argument from pagevec_lookup{,_range}()	2017-09-06 17:27:27 -07:00
f2fs	f2fs: avoid cpu lockup	2017-07-17 19:23:18 -07:00
fat	fat: fix using uninitialized fields of fat_inode/fsinfo_inode	2017-03-09 17:01:10 -08:00
freevxfs
fscache	mm: remove nr_pages argument from pagevec_lookup{,_range}()	2017-09-06 17:27:27 -07:00
fuse	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse	2017-08-11 11:20:48 -07:00
gfs2	Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-07-15 12:00:42 -07:00
hfs	fs: semove set but not checked AOP_FLAG_UNINTERRUPTIBLE flag	2017-05-08 17:15:14 -07:00
hfsplus	hfsplus: Don't clear SGID when inheriting ACLs	2017-07-18 18:23:39 +02:00
hostfs
hpfs	sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>	2017-03-02 08:42:32 +01:00
hugetlbfs	mm: remove nr_pages argument from pagevec_lookup{,_range}()	2017-09-06 17:27:27 -07:00
isofs	isofs: Fix off-by-one in 'session' mount option parsing	2017-07-18 12:33:16 +02:00
jbd2	Writeback error handling fixes (pile #2 )	2017-07-07 19:38:17 -07:00
jffs2	jffs2: fix spelling mistake: "requestied" -> "requested"	2017-04-19 11:35:55 -07:00
jfs	jfs should use MAX_LFS_FILESIZE when calculating s_maxbytes	2017-08-31 17:02:21 -07:00
kernfs	kernfs: Clarify lockdep name for kn->count	2017-08-28 16:50:15 +02:00
lockd	sunrpc: mark all struct svc_version instances as const	2017-07-13 15:58:03 -04:00
minix	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-07-08 10:50:54 -07:00
ncpfs	mm: per-cgroup memory reclaim stats	2017-07-06 16:24:35 -07:00
nfs	fscache: remove unused ->now_uncached callback	2017-09-06 17:27:26 -07:00
nfs_common
nfsd	annotate RWF_... flags	2017-08-31 17:32:38 -04:00
nilfs2	mm: remove nr_pages argument from pagevec_lookup{,_range}()	2017-09-06 17:27:27 -07:00
nls
notify	dentry name snapshots	2017-07-07 20:09:10 -04:00
ntfs	ntfs: Use ERR_CAST() to avoid cross-structure cast	2017-05-28 10:11:48 -07:00
ocfs2	ocfs2: clean up some dead code	2017-09-06 17:27:24 -07:00
omfs	omfs: Implement show_options	2017-07-06 03:31:46 -04:00
openpromfs
orangefs	Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-07-15 12:00:42 -07:00
overlayfs	overlayfs, locking: Remove smp_mb__before_spinlock() usage	2017-08-10 12:29:02 +02:00
proc	mm: add /proc/pid/smaps_rollup	2017-09-06 17:27:30 -07:00
pstore	Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-07-15 12:00:42 -07:00
qnx4
qnx6
quota	quota: correct space limit check	2017-08-07 16:51:28 +02:00
ramfs	mm: make pagevec_lookup() update index	2017-09-06 17:27:26 -07:00
reiserfs	reiserfs: preserve i_mode if __reiserfs_set_acl() fails	2017-07-18 11:24:08 +02:00
romfs
squashfs	fs/pstore: fs/squashfs: change usage of LZ4 to work with new LZ4 version	2017-02-24 17:46:57 -08:00
sysfs	sysfs: be careful of error returns from ops->show()	2017-04-08 17:33:32 +02:00
sysv	mm: drop "wait" parameter from write_one_page()	2017-07-05 18:44:22 -04:00
tracefs	VFS: Don't use save/replace_mount_options if not using generic_show_options	2017-07-06 03:31:46 -04:00
ubifs	ubifs: Set double hash cookie also for RENAME_EXCHANGE	2017-07-14 22:50:57 +02:00
udf	udf: Convert udf_disk_stamp_to_time() to use mktime64()	2017-06-14 11:21:02 +02:00
ufs	Writeback error handling fixes (pile #1 )	2017-07-07 18:39:15 -07:00
xfs	dax: use common 4k zero page for dax mmap reads	2017-09-06 17:27:24 -07:00
aio.c	fs: add O_DIRECT and aio support for sending down write life time hints	2017-06-27 12:05:36 -06:00
anon_inodes.c
attr.c	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h>	2017-03-02 08:42:29 +01:00
bad_inode.c	statx: Add a system call to make enhanced file info available	2017-03-02 20:51:15 -05:00
binfmt_aout.c	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h>	2017-03-02 08:42:36 +01:00
binfmt_elf_fdpic.c	sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h>	2017-03-02 08:42:39 +01:00
binfmt_elf.c	x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checks	2017-08-16 20:32:02 +02:00
binfmt_em86.c
binfmt_flat.c	binfmt_flat: Use %u to format u32	2017-07-16 09:24:05 -07:00
binfmt_misc.c	fs: constify tree_descr arrays passed to simple_fill_super()	2017-04-26 23:54:06 -04:00
binfmt_script.c
block_dev.c	Writeback error handling fixes (pile #2 )	2017-07-07 19:38:17 -07:00
buffer.c	mm: remove nr_pages argument from pagevec_lookup{,_range}()	2017-09-06 17:27:27 -07:00
char_dev.c	char_dev: order /proc/devices by major number	2017-07-17 15:28:50 +02:00
compat_binfmt_elf.c
compat_ioctl.c	Merge branch 'work.__copy_in_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-07-08 10:15:02 -07:00
compat.c	fs/compat.c: trim unused includes	2017-04-17 12:52:27 -04:00
coredump.c	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h>	2017-03-02 08:42:36 +01:00
dax.c	dax: initialize variable pfn before using it	2017-09-06 17:27:24 -07:00
dcache.c	Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-07-15 12:00:42 -07:00
dcookies.c
direct-io.c	fs: add O_DIRECT and aio support for sending down write life time hints	2017-06-27 12:05:36 -06:00
drop_caches.c
eventfd.c	There has been a fair amount of activity in the docs tree this time	2017-07-03 21:13:25 -07:00
eventpoll.c	epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove()	2017-09-01 13:07:35 -07:00
exec.c	exec: Limit arg stack to at most 75% of _STK_LIM	2017-07-07 20:05:08 -07:00
fcntl.c	vfs: fix flock compat thinko	2017-07-07 13:48:18 -07:00
fhandle.c	fhandle: move compat syscalls from compat.c	2017-04-17 12:52:26 -04:00
file_table.c	fs: new infrastructure for writeback error handling and reporting	2017-07-06 07:02:25 -04:00
file.c	fs/file.c: replace alloc_fdmem() with kvmalloc() alternative	2017-07-06 16:24:30 -07:00
filesystems.c	Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-07-15 12:00:42 -07:00
fs_pin.c	sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming	2017-06-20 12:19:14 +02:00
fs_struct.c	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h>	2017-03-02 08:42:35 +01:00
fs-writeback.c	writeback: rework wb_[dec\|inc]_stat family of functions	2017-07-12 16:26:05 -07:00
inode.c	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-07-08 10:50:54 -07:00
internal.h	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-05-09 09:12:53 -07:00
ioctl.c	sched/headers: Prepare for the reduction of <linux/sched.h>'s signal API dependency	2017-03-02 08:42:37 +01:00
iomap.c	iomap: fix integer truncation issues in the zeroing and dirtying helpers	2017-08-11 16:56:33 -07:00
Kconfig	fs/Kconfig: kill CONFIG_PERCPU_RWSEM some more	2017-07-12 16:26:00 -07:00
Kconfig.binfmt
libfs.c	fs: convert __generic_file_fsync to use errseq_t based reporting	2017-07-06 07:02:29 -04:00
locks.c	fs/locks: pass kernel struct flock to fcntl_getlk/setlk	2017-05-27 06:07:19 -04:00
Makefile
mbcache.c	ext4: xattr inode deduplication	2017-06-22 11:44:55 -04:00
mount.h	Now that IPC and other changes have landed, enable manual markings for	2017-07-19 08:55:18 -07:00
mpage.c	There has been a fair amount of activity in the docs tree this time	2017-07-03 21:13:25 -07:00
namei.c	Now that IPC and other changes have landed, enable manual markings for	2017-07-19 08:55:18 -07:00
namespace.c	Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-07-15 12:00:42 -07:00
no-block.c
nsfs.c	VFS: Provide empty name qstr	2017-07-06 03:27:09 -04:00
open.c	Writeback error handling fixes (pile #2 )	2017-07-07 19:38:17 -07:00
pipe.c	VFS: Provide empty name qstr	2017-07-06 03:27:09 -04:00
pnode.c	mnt: Make propagate_umount less slow for overlapping mount propagation trees	2017-05-23 08:41:17 -05:00
pnode.h
posix_acl.c	sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>	2017-03-02 08:42:31 +01:00
proc_namespace.c	sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h>	2017-03-02 08:42:38 +01:00
read_write.c	annotate RWF_... flags	2017-08-31 17:32:38 -04:00
readdir.c	readdir: move compat syscalls from compat.c	2017-04-17 12:52:24 -04:00
select.c	fs/select: Fix memory corruption in compat_get_fd_set()	2017-08-28 16:09:19 -07:00
seq_file.c	mm: introduce kv[mz]alloc helpers	2017-05-08 17:15:12 -07:00
signalfd.c	sched/wait: Rename wait_queue_t => wait_queue_entry_t	2017-06-20 12:18:27 +02:00
splice.c	fs: implement vfs_iter_write using do_iter_write	2017-06-29 17:49:23 -04:00
stack.c
stat.c	ufs: restore maintaining ->i_blocks	2017-06-09 16:28:01 -04:00
statfs.c	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-07-08 10:50:54 -07:00
super.c	VFS: Kill off s_options and helpers	2017-07-11 06:09:21 -04:00
sync.c	fs/sync.c: remove unnecessary NULL f_mapping check in sync_file_range	2017-09-06 17:27:28 -07:00
timerfd.c	timerfd: Use get_itimerspec64() and put_itimerspec64()	2017-06-30 04:14:38 -04:00
userfaultfd.c	userfaultfd: provide pid in userfault msg - add feat union	2017-09-06 17:27:29 -07:00
utimes.c	utimes: move compat syscalls from compat.c	2017-04-17 12:52:23 -04:00
xattr.c	treewide: use kv[mz]alloc* rather than opencoded variants	2017-05-08 17:15:13 -07:00