linux_dsm_epyc7002/fs/proc
Yang Shi 88aa7cc688 mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
mmap_sem is on the hot path of kernel, and it very contended, but it is
abused too.  It is used to protect arg_start|end and evn_start|end when
reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make
sense since those proc files just expect to read 4 values atomically and
not related to VM, they could be set to arbitrary values by C/R.

And, the mmap_sem contention may cause unexpected issue like below:

INFO: task ps:14018 blocked for more than 120 seconds.
       Tainted: G            E 4.9.79-009.ali3000.alios7.x86_64 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
 ps              D    0 14018      1 0x00000004
 Call Trace:
   schedule+0x36/0x80
   rwsem_down_read_failed+0xf0/0x150
   call_rwsem_down_read_failed+0x18/0x30
   down_read+0x20/0x40
   proc_pid_cmdline_read+0xd9/0x4e0
   __vfs_read+0x37/0x150
   vfs_read+0x96/0x130
   SyS_read+0x55/0xc0
   entry_SYSCALL_64_fastpath+0x1a/0xc5

Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock
for them to mitigate the abuse of mmap_sem.

So, introduce a new spinlock in mm_struct to protect the concurrent
access to arg_start|end, env_start|end and others, as well as replace
write map_sem to read to protect the race condition between prctl and
sys_brk which might break check_data_rlimit(), and makes prctl more
friendly to other VM operations.

This patch just eliminates the abuse of mmap_sem, but it can't resolve
the above hung task warning completely since the later
access_remote_vm() call needs acquire mmap_sem.  The mmap_sem
scalability issue will be solved in the future.

[yang.shi@linux.alibaba.com: add comment about mmap_sem and arg_lock]
  Link: http://lkml.kernel.org/r/1524077799-80690-1-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1523730291-109696-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:34 -07:00
..
array.c Merge branch 'for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2018-06-05 17:31:33 -07:00
base.c mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct 2018-06-07 17:34:34 -07:00
cmdline.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
consoles.c proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
cpuinfo.c x86 / CPU: Always show current CPU frequency in /proc/cpuinfo 2017-11-15 19:46:50 +01:00
devices.c proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
fd.c procfs: switch instantiate_t to d_splice_alias() 2018-05-26 14:20:50 -04:00
fd.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
generic.c Merge branch 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-06-04 13:46:22 -07:00
inode.c proc: move "struct proc_dir_entry" into kmem cache 2018-04-11 10:28:34 -07:00
internal.h Merge branch 'for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2018-06-05 17:31:33 -07:00
interrupts.c proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
Kconfig vmcore: add API to collect hardware dump in second kernel 2018-05-14 13:46:04 -04:00
kcore.c proc/kcore: don't bounds check against address 0 2018-05-11 17:28:45 -07:00
kmsg.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
loadavg.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
Makefile proc: : uninline name_to_int() 2017-11-17 16:10:00 -08:00
meminfo.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
namespaces.c procfs: switch instantiate_t to d_splice_alias() 2018-05-26 14:20:50 -04:00
nommu.c proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
page.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
proc_net.c proc: introduce proc_create_net_single 2018-05-16 07:24:30 +02:00
proc_sysctl.c switch the rest of procfs lookups to d_splice_alias() 2018-05-26 14:20:50 -04:00
proc_tty.c tty: replace ->proc_fops with ->proc_show 2018-05-16 07:24:30 +02:00
root.c proc: use slower rb_first() 2018-04-11 10:28:34 -07:00
self.c proc: introduce a proc_pid_ns helper 2018-05-16 07:23:35 +02:00
softirqs.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
stat.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
task_mmu.c powerpc updates for 4.18 2018-06-07 10:23:33 -07:00
task_nommu.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
thread_self.c proc: introduce a proc_pid_ns helper 2018-05-16 07:23:35 +02:00
uptime.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
util.c proc: use do-while in name_to_int() 2017-11-17 16:10:00 -08:00
version.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
vmcore.c vmcore: move get_vmcore_size out of __init 2018-05-21 12:34:22 -04:00