linux_dsm_epyc7002/kernel
Juri Lelli 9a68fa0ebb sched/features: Fix hrtick reprogramming
[ Upstream commit 156ec6f42b8d300dbbf382738ff35c8bad8f4c3a ]

Hung tasks and RCU stall cases were reported on systems which were not
100% busy. Investigation of such unexpected cases (no sign of potential
starvation caused by tasks hogging the system) pointed out that the
periodic sched tick timer wasn't serviced anymore after a certain point
and that caused all machinery that depends on it (timers, RCU, etc.) to
stop working as well. This issues was however only reproducible if
HRTICK was enabled.

Looking at core dumps it was found that the rbtree of the hrtimer base
used also for the hrtick was corrupted (i.e. next as seen from the base
root and actual leftmost obtained by traversing the tree are different).
Same base is also used for periodic tick hrtimer, which might get "lost"
if the rbtree gets corrupted.

Much alike what described in commit 1f71addd34 ("tick/sched: Do not
mess with an enqueued hrtimer") there is a race window between
hrtimer_set_expires() in hrtick_start and hrtimer_start_expires() in
__hrtick_restart() in which the former might be operating on an already
queued hrtick hrtimer, which might lead to corruption of the base.

Use hrtick_start() (which removes the timer before enqueuing it back) to
ensure hrtick hrtimer reprogramming is entirely guarded by the base
lock, so that no race conditions can occur.

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210208073554.14629-2-juri.lelli@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-07 12:34:13 +01:00
..
bpf bpf: Clear subreg_def for global function return values 2021-03-04 11:37:34 +01:00
cgroup cgroup-v1: add disabled controller check in cgroup1_parse_param() 2021-02-17 11:02:25 +01:00
configs
debug kgdb: fix to kill breakpoints on initmem after boot 2021-03-04 11:38:46 +01:00
dma
entry
events exec: Transform exec_update_mutex into a rw_semaphore 2021-01-09 13:46:24 +01:00
gcov
irq genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set 2021-02-10 09:29:17 +01:00
kcsan kcsan: Rewrite kcsan_prandom_u32_max() without prandom_u32_state() 2021-03-04 11:37:37 +01:00
livepatch
locking locking/lockdep: Avoid unmatched unlock 2021-03-04 11:37:47 +01:00
power PM: hibernate: flush swap writer after marking 2021-02-03 23:28:40 +01:00
printk printk: fix deadlock when kernel panic 2021-03-04 11:38:41 +01:00
rcu rcu/nocb: Perform deferred wake up before last idle's need_resched() check 2021-03-04 11:38:35 +01:00
sched sched/features: Fix hrtick reprogramming 2021-03-07 12:34:13 +01:00
time tick/sched: Remove bogus boot "safety" check 2021-01-06 14:56:55 +01:00
trace bpf: Unbreak BPF_PROG_TYPE_KPROBE when kprobe is called via do_int3 2021-02-17 11:02:25 +01:00
.gitignore
acct.c
async.c
audit_fsnotify.c fsnotify: generalize handle_inode_event() 2020-12-30 11:54:18 +01:00
audit_tree.c fsnotify: generalize handle_inode_event() 2020-12-30 11:54:18 +01:00
audit_watch.c fsnotify: generalize handle_inode_event() 2020-12-30 11:54:18 +01:00
audit.c
audit.h
auditfilter.c
auditsc.c
backtracetest.c
bounds.c
capability.c
compat.c
configs.c
context_tracking.c
cpu_pm.c
cpu.c
crash_core.c
crash_dump.c
cred.c
delayacct.c
dma.c
exec_domain.c
exit.c kernel/io_uring: cancel io_uring before task works 2021-01-30 13:55:18 +01:00
extable.c
fail_function.c
fork.c exec: Transform exec_update_mutex into a rw_semaphore 2021-01-09 13:46:24 +01:00
freezer.c
futex.c futex: Handle faults correctly for PI futexes 2021-01-30 13:55:17 +01:00
gen_kheaders.sh
groups.c
hung_task.c
iomem.c
irq_work.c
jump_label.c
kallsyms.c
kcmp.c exec: Transform exec_update_mutex into a rw_semaphore 2021-01-09 13:46:24 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c
kexec_core.c kernel: kexec: remove the lock operation of system_transition_mutex 2021-02-03 23:28:37 +01:00
kexec_elf.c
kexec_file.c ima: Free IMA measurement buffer after kexec syscall 2021-03-04 11:37:50 +01:00
kexec_internal.h
kexec.c
kheaders.c
kmod.c
kprobes.c kprobes: Fix to delay the kprobes jump optimization 2021-03-04 11:38:35 +01:00
ksysfs.c
kthread.c kthread: Extract KTHREAD_IS_PER_CPU 2021-02-07 15:37:17 +01:00
latencytop.c
Makefile kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE 2021-03-04 11:38:41 +01:00
module_signature.c
module_signing.c
module-internal.h
module.c module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbols 2021-03-04 11:38:39 +01:00
notifier.c
nsproxy.c
padata.c
panic.c
params.c
pid_namespace.c
pid.c exec: Transform exec_update_mutex into a rw_semaphore 2021-01-09 13:46:24 +01:00
profile.c
ptrace.c
range.c
reboot.c
regset.c
relay.c
resource.c
rseq.c
scftorture.c
scs.c
seccomp.c seccomp: Add missing return in non-void function 2021-03-04 11:38:32 +01:00
signal.c
smp.c smp: Process pending softirqs in flush_smp_call_function_from_idle() 2021-03-04 11:37:51 +01:00
smpboot.c kthread: Extract KTHREAD_IS_PER_CPU 2021-02-07 15:37:17 +01:00
smpboot.h
softirq.c
stackleak.c
stacktrace.c
static_call.c
stop_machine.c
sys_ni.c
sys.c
sysctl-test.c
sysctl.c
task_work.c
taskstats.c
test_kprobes.c
torture.c
tracepoint.c tracepoint: Do not fail unregistering a probe due to memory failure 2021-03-04 11:38:03 +01:00
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c
up.c
user_namespace.c
user-return-notifier.c
user.c
usermode_driver.c
utsname_sysctl.c
utsname.c
watch_queue.c
watchdog_hld.c
watchdog.c
workqueue_internal.h
workqueue.c workqueue: Restrict affinity change to rescuer 2021-02-07 15:37:17 +01:00