linux_dsm_epyc7002/kernel/rcu
Sergey Senozhatsky 7f65cfa391 rcu/tree: Handle VM stoppage in stall detection
[ Upstream commit ccfc9dd6914feaa9a81f10f9cce56eb0f7712264 ]

The soft watchdog timer function checks if a virtual machine
was suspended and hence what looks like a lockup in fact
is a false positive.

This is what kvm_check_and_clear_guest_paused() does: it
tests guest PVCLOCK_GUEST_STOPPED (which is set by the host)
and if it's set then we need to touch all watchdogs and bail
out.

Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED
check works fine.

There is, however, one more watchdog that runs from IRQ, so
watchdog timer fn races with it, and that watchdog is not aware
of PVCLOCK_GUEST_STOPPED - RCU stall detector.

apic_timer_interrupt()
 smp_apic_timer_interrupt()
  hrtimer_interrupt()
   __hrtimer_run_queues()
    tick_sched_timer()
     tick_sched_handle()
      update_process_times()
       rcu_sched_clock_irq()

This triggers RCU stalls on our devices during VM resume.

If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU
before watchdog_timer_fn()->kvm_check_and_clear_guest_paused()
then there is nothing on this VCPU that touches watchdogs and
RCU reads stale gp stall timestamp and new jiffies value, which
makes it think that RCU has stalled.

Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and
don't report RCU stalls when we resume the VM.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-05 19:08:50 +02:00
..
Kconfig rcu: Reduce leaf fanout for strict RCU grace periods 2020-08-24 18:40:23 -07:00
Kconfig.debug Merge branch 'strictgp.2020.08.24a' into HEAD 2020-09-03 09:47:42 -07:00
Makefile rcuperf: Change rcuperf to rcuscale 2020-08-24 18:39:24 -07:00
rcu_segcblist.c rcu/segcblist: Prevent useless GP start if no CBs to accelerate 2020-09-03 09:39:59 -07:00
rcu_segcblist.h
rcu.h srcu: Fix broken node geometry after early ssp init 2021-07-20 16:05:38 +02:00
rcuscale.c rcuperf: Change rcuperf to rcuscale 2020-08-24 18:39:24 -07:00
rcutorture.c rcutorture: Allow pointer leaks to test diagnostic code 2020-08-24 18:45:36 -07:00
refscale.c refperf: Avoid null pointer dereference when buf fails to allocate 2020-08-24 18:45:35 -07:00
srcutiny.c srcu: Provide polling interfaces for Tiny SRCU grace periods 2024-07-05 19:01:27 +02:00
srcutree.c srcu: Provide polling interfaces for Tree SRCU grace periods 2024-07-05 19:01:27 +02:00
sync.c
tasks.h rcu-tasks: Don't delete holdouts within trc_wait_for_one_reader() 2021-07-31 08:16:11 +02:00
tiny.c rcu: Rename *_kfree_callback/*_kfree_rcu_offset/kfree_call_* 2020-06-29 11:59:25 -07:00
tree_exp.h rcu: Initialize at declaration time in rcu_exp_handler() 2020-08-24 18:36:03 -07:00
tree_plugin.h rcu/nocb: Perform deferred wake up before last idle's need_resched() check 2021-03-04 11:38:35 +01:00
tree_stall.h rcu/tree: Handle VM stoppage in stall detection 2024-07-05 19:08:50 +02:00
tree.c srcu: Fix broken node geometry after early ssp init 2021-07-20 16:05:38 +02:00
tree.h Merge branch 'strictgp.2020.08.24a' into HEAD 2020-09-03 09:47:42 -07:00
update.c rcu: Reject RCU_LOCKDEP_WARN() false positives 2021-07-20 16:05:38 +02:00