linux_dsm_epyc7002/kernel
Jack Miller ab602f7991 shm: make exit_shm work proportional to task activity
This is small set of patches our team has had kicking around for a few
versions internally that fixes tasks getting hung on shm_exit when there
are many threads hammering it at once.

Anton wrote a simple test to cause the issue:

  http://ozlabs.org/~anton/junkcode/bust_shm_exit.c

Before applying this patchset, this test code will cause either hanging
tracebacks or pthread out of memory errors.

After this patchset, it will still produce output like:

  root@somehost:~# ./bust_shm_exit 1024 160
  ...
  INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 116, t=2111 jiffies, g=241, c=240, q=7113)
  INFO: Stall ended before state dump start
  ...

But the task will continue to run along happily, so we consider this an
improvement over hanging, even if it's a bit noisy.

This patch (of 3):

exit_shm obtains the ipc_ns shm rwsem for write and holds it while it
walks every shared memory segment in the namespace.  Thus the amount of
work is related to the number of shm segments in the namespace not the
number of segments that might need to be cleaned.

In addition, this occurs after the task has been notified the thread has
exited, so the number of tasks waiting for the ns shm rwsem can grow
without bound until memory is exausted.

Add a list to the task struct of all shmids allocated by this task.  Init
the list head in copy_process.  Use the ns->rwsem for locking.  Add
segments after id is added, remove before removing from id.

On unshare of NEW_IPCNS orphan any ids as if the task had exited, similar
to handling of semaphore undo.

I chose a define for the init sequence since its a simple list init,
otherwise it would require a function call to avoid include loops between
the semaphore code and the task struct.  Converting the list_del to
list_del_init for the unshare cases would remove the exit followed by
init, but I left it blow up if not inited.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Jack Miller <millerjo@us.ibm.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:26 -07:00
..
bpf net: filter: split 'struct sk_filter' into socket and bpf parts 2014-08-02 15:03:58 -07:00
debug kdb: Use ktime_get_ts() 2014-06-12 16:18:45 +02:00
events mm: memcontrol: rewrite charge API 2014-08-08 15:57:17 -07:00
gcov kernel/gcov/fs.c: remove unnecessary null test before debugfs_remove 2014-08-08 15:57:24 -07:00
irq Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-08-05 17:38:45 -07:00
locking arch, locking: Ciao arch_mutex_cpu_relax() 2014-07-17 12:32:47 +02:00
power ACPI and power management updates for 3.17-rc1 2014-08-06 20:34:19 -07:00
printk kernel/printk/printk.c: fix bool assignements 2014-08-06 18:01:24 -07:00
rcu Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-08-04 15:55:08 -07:00
sched ACPI and power management updates for 3.17-rc1 2014-08-06 20:34:19 -07:00
time Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-08-05 17:46:42 -07:00
trace Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-08-05 17:46:42 -07:00
.gitignore Ignore generated file kernel/x509_certificate_list 2013-12-10 18:21:34 +00:00
acct.c sched: Make task->start_time nanoseconds based 2014-07-23 10:18:05 -07:00
async.c
audit_tree.c inotify: Fix reporting of cookies for inotify events 2014-02-18 11:17:17 +01:00
audit_watch.c inotify: Fix reporting of cookies for inotify events 2014-02-18 11:17:17 +01:00
audit.c CAPABILITIES: remove undefined caps from all processes 2014-07-24 21:53:47 +10:00
audit.h audit: Use struct net not pid_t to remember the network namespce to reply in 2014-03-20 10:10:53 -04:00
auditfilter.c kernel/auditfilter.c: replace count*size kmalloc by kcalloc 2014-08-06 18:01:12 -07:00
auditsc.c auditsc: audit_krule mask accesses need bounds checking 2014-06-10 08:44:40 -07:00
backtracetest.c kernel/backtracetest.c: replace no level printk by pr_info() 2014-06-04 16:54:14 -07:00
bounds.c page-cgroup: get rid of NR_PCG_FLAGS 2014-08-08 15:57:18 -07:00
capability.c CAPABILITIES: remove undefined caps from all processes 2014-07-24 21:53:47 +10:00
cgroup_freezer.c cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes 2014-07-15 11:05:09 -04:00
cgroup.c Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2014-08-04 10:11:28 -07:00
compat.c kernel/compat.c: use sizeof() instead of sizeof 2014-06-04 16:54:19 -07:00
configs.c
context_tracking.c x86/kprobes: Fix build errors and blacklist context_track_user 2014-06-14 09:07:44 +02:00
cpu_pm.c
cpu.c sched: Rework check_for_tasks() 2014-07-05 11:17:45 +02:00
cpuset.c Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2014-08-04 10:11:28 -07:00
crash_dump.c
cred.c
delayacct.c delayacct: Remove braindamaged type conversions 2014-07-23 10:18:06 -07:00
dma.c
elfcore.c switch elf_core_write_extra_phdrs() to dump_emit() 2013-11-09 00:16:23 -05:00
exec_domain.c kernel/exec_domain.c: code clean-up 2014-06-04 16:54:15 -07:00
exit.c kernel/exit.c: fix coding style warnings and errors 2014-08-08 15:57:22 -07:00
extable.c asmlinkage: Make main_extable_sort_needed visible 2014-02-13 18:13:22 -08:00
fork.c shm: make exit_shm work proportional to task activity 2014-08-08 15:57:26 -07:00
freezer.c libata, freezer: avoid block device removal while system is frozen 2013-12-19 13:50:32 -05:00
futex_compat.c compat: Get rid of (get|put)_compat_time(val|spec) 2014-02-02 14:09:12 -08:00
futex.c futex: Simplify futex_lock_pi_atomic() and make it more robust 2014-06-21 22:26:24 +02:00
groups.c kernel/groups.c: remove return value of set_groups 2014-04-03 16:21:05 -07:00
hung_task.c kernel/hung_task.c: convert simple_strtoul to kstrtouint 2014-06-04 16:54:15 -07:00
irq_work.c irq_work: Remove BUG_ON in irq_work_run() 2014-07-05 11:17:26 +02:00
jump_label.c static_key: WARN on usage before jump_label_init was called 2013-10-19 19:45:35 -04:00
kallsyms.c kernel/kallsyms.c: fix %pB when there's no symbol at the address 2014-08-08 15:57:18 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS 2013-11-15 09:32:22 +09:00
Kconfig.locks locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNER 2014-07-16 14:57:13 +02:00
Kconfig.preempt
kexec.c kexec: fix build error when hugetlbfs is disabled 2014-07-30 20:09:37 -07:00
kmod.c signals: change wait_for_helper() to use kernel_sigaction() 2014-06-06 16:08:12 -07:00
kprobes.c kprobes: Fix "Failed to find blacklist" probing errors on ia64 and ppc64 2014-07-18 06:23:40 +02:00
ksysfs.c kobject: Make support for uevent_helper optional. 2014-04-25 12:00:49 -07:00
kthread.c kthread_work: wake up worker only when the worker is idle 2014-07-28 14:07:52 -04:00
latencytop.c kernel/latencytop.c: convert seq_printf to seq_puts 2014-06-04 16:54:15 -07:00
Makefile Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-08-06 09:38:14 -07:00
module_signing.c keys: change asymmetric keys to use common hash definitions 2013-10-25 17:15:18 -04:00
module-internal.h KEYS: Separate the kernel signature checking keyring from module signing 2013-09-25 17:17:01 +01:00
module.c crypto: fips - only panic on bad/missing crypto mod signatures 2014-07-03 21:38:32 +08:00
notifier.c kprobes, notifier: Use NOKPROBE_SYMBOL macro in notifier 2014-04-24 10:26:39 +02:00
nsproxy.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2013-09-07 14:35:32 -07:00
padata.c padata: Fix wrong usage of rcu_dereference() 2013-12-05 21:28:42 +08:00
panic.c panic: add TAINT_SOFTLOCKUP 2014-08-08 15:57:24 -07:00
params.c Add module param type 'ullong' 2014-07-17 22:07:37 +02:00
pid_namespace.c pid_namespace: pidns_get() should check task_active_pid_ns() != NULL 2014-04-02 16:20:21 -07:00
pid.c pidns: fix free_pid() to handle the first fork failure 2013-09-30 14:31:03 -07:00
profile.c kernel/profile.c: use static const char instead of static char 2014-06-06 16:08:13 -07:00
ptrace.c sched: Remove proliferation of wait_on_bit() action functions 2014-07-16 15:10:39 +02:00
range.c
reboot.c kernel/reboot.c: convert simple_strtoul to kstrtoint 2014-06-04 16:54:15 -07:00
relay.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
res_counter.c kernel/res_counter.c: replace simple_strtoull by kstrtoull 2014-06-04 16:54:15 -07:00
resource.c resources: Clarify sanity check message 2014-05-23 10:47:21 -06:00
seccomp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-08-06 09:38:14 -07:00
signal.c signal: Explain local_irq_save() call 2014-07-09 09:14:33 -07:00
smp.c kernel/smp.c:on_each_cpu_cond(): fix warning in fallback path 2014-08-06 18:01:22 -07:00
smpboot.c
smpboot.h
softirq.c Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu 2014-05-22 11:36:10 +02:00
stacktrace.c
stop_machine.c kernel/stop_machine.c: kernel-doc warning fix 2014-06-04 16:54:15 -07:00
sys_ni.c seccomp: add "seccomp" syscall 2014-07-18 12:13:37 -07:00
sys.c sched: move no_new_privs into new atomic flags 2014-07-18 12:13:38 -07:00
sysctl_binary.c ipv6: Allow accepting RA from local IP addresses. 2014-07-01 12:16:24 -07:00
sysctl.c mm, hugetlb: remove hugetlb_zero and hugetlb_infinity 2014-08-06 18:01:19 -07:00
system_certificates.S KEYS: correct alignment of system_certificate_list content in assembly file 2013-12-10 18:25:28 +00:00
system_keyring.c KEYS: validate certificate trust only with builtin keys 2014-07-17 09:35:17 -04:00
task_work.c task_work: documentation 2013-09-11 15:58:27 -07:00
taskstats.c genetlink: only pass array to genl_register_family_with_ops() 2013-11-19 16:39:05 -05:00
test_kprobes.c kernel/test_kprobes.c: use current logging functions 2014-08-08 15:57:18 -07:00
torture.c torture: Avoid format string leak to thead name 2014-07-07 10:12:56 -07:00
tracepoint.c tracing: syscall_regfunc() should not skip kernel threads 2014-06-21 00:15:26 -04:00
tsacct.c sched: Make task->start_time nanoseconds based 2014-07-23 10:18:05 -07:00
uid16.c userns: Kill nsown_capable it makes the wrong thing easy 2013-08-30 23:44:11 -07:00
up.c smp: Rename __smp_call_function_single() to smp_call_function_single_async() 2014-02-24 14:47:15 -08:00
user_namespace.c proc: constify seq_operations 2014-08-08 15:57:22 -07:00
user-return-notifier.c
user.c kernel/user.c: drop unused field 'files' from user_struct 2014-06-04 16:54:16 -07:00
utsname_sysctl.c sysctl: convert use of typedef ctl_table to struct ctl_table 2014-06-06 16:08:16 -07:00
utsname.c userns: Kill nsown_capable it makes the wrong thing easy 2013-08-30 23:44:11 -07:00
watchdog.c panic: add TAINT_SOFTLOCKUP 2014-08-08 15:57:24 -07:00
workqueue_internal.h workqueue: rename manager_mutex to attach_mutex 2014-05-20 10:59:32 -04:00
workqueue.c Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2014-08-04 10:09:27 -07:00