linux_dsm_epyc7002/include/linux/sched
Jann Horn 16d51a590a sched/fair: Don't free p->numa_faults with concurrent readers
When going through execve(), zero out the NUMA fault statistics instead of
freeing them.

During execve, the task is reachable through procfs and the scheduler. A
concurrent /proc/*/sched reader can read data from a freed ->numa_faults
allocation (confirmed by KASAN) and write it back to userspace.
I believe that it would also be possible for a use-after-free read to occur
through a race between a NUMA fault and execve(): task_numa_fault() can
lead to task_numa_compare(), which invokes task_weight() on the currently
running task of a different CPU.

Another way to fix this would be to make ->numa_faults RCU-managed or add
extra locking, but it seems easier to wipe the NUMA fault statistics on
execve.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Fixes: 82727018b0 ("sched/numa: Call task_numa_free() from do_execve()")
Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-07-25 15:37:04 +02:00
..
autogroup.h
clock.h
coredump.h oom, oom_reaper: do not enqueue same task twice 2019-02-01 15:46:23 -08:00
cpufreq.h
cputime.h
deadline.h
debug.h
hotplug.h
idle.h
init.h
isolation.h KVM: LAPIC: Inject timer interrupt via posted interrupt 2019-07-20 09:00:40 +02:00
jobctl.h cgroup: cgroup v2 freezer 2019-04-19 11:26:48 -07:00
loadavg.h
mm.h coredump: fix race condition between collapse_huge_page() and core dumping 2019-06-13 17:34:56 -10:00
nohz.h sched/fair: Remove the rq->cpu_load[] update code 2019-06-03 11:49:38 +02:00
numa_balancing.h sched/fair: Don't free p->numa_faults with concurrent readers 2019-07-25 15:37:04 +02:00
prio.h
rt.h
signal.h signal: simplify set_user_sigmask/restore_user_sigmask 2019-07-16 19:23:24 -07:00
smt.h
stat.h
sysctl.h sched/uclamp: Add system default clamps 2019-06-24 19:23:45 +02:00
task_stack.h sched/core: Convert task_struct.stack_refcount to refcount_t 2019-02-04 08:53:56 +01:00
task.h clone: fix CLONE_PIDFD support 2019-07-14 20:36:12 +02:00
topology.h sched/uclamp: Add CPU's clamp buckets refcounting 2019-06-24 19:23:44 +02:00
user.h keys: Move the user and user-session keyrings to the user_namespace 2019-06-26 21:02:32 +01:00
wake_q.h locking/rwsem: Always release wait_lock before waking up tasks 2019-06-17 12:28:00 +02:00
xacct.h