linux_dsm_epyc7002/kernel/time
Thomas Gleixner 041ad7bc75 timers: Prevent base clock rewind when forwarding clock
Ashton and Michael reported, that kernel versions 4.8 and later suffer from
USB timeouts which are caused by the timer wheel rework.

This is caused by a bug in the base clock forwarding mechanism, which leads
to timers expiring early. The scenario which leads to this is:

run_timers()
  while (jiffies >= base->clk) {
    collect_expired_timers();
    base->clk++;
    expire_timers();
  }          

So base->clk = jiffies + 1. Now the cpu goes idle:

idle()
  get_next_timer_interrupt()
    nextevt = __next_time_interrupt();
    if (time_after(nextevt, base->clk))
       	base->clk = jiffies;

jiffies has not advanced since run_timers(), so this assignment effectively
decrements base->clk by one.

base->clk is the index into the timer wheel arrays. So let's assume the
following state after the base->clk increment in run_timers():

 jiffies = 0
 base->clk = 1

A timer gets enqueued with an expiry delta of 63 ticks (which is the case
with the USB timeout and HZ=250) so the resulting bucket index is:

  base->clk + delta = 1 + 63 = 64

The timer goes into the first wheel level. The array size is 64 so it ends
up in bucket 0, which is correct as it takes 63 ticks to advance base->clk
to index into bucket 0 again.

If the cpu goes idle before jiffies advance, then the bug in the forwarding
mechanism sets base->clk back to 0, so the next invocation of run_timers()
at the next tick will index into bucket 0 and therefore expire the timer 62
ticks too early.

Instead of blindly setting base->clk to jiffies we must make the forwarding
conditional on jiffies > base->clk, but we cannot use jiffies for this as
we might run into the following issue:

  if (time_after(jiffies, base->clk) {
    if (time_after(nextevt, base->clk))
       base->clk = jiffies;

jiffies can increment between the check and the assigment far enough to
advance beyond nextevt. So we need to use a stable value for checking.

get_next_timer_interrupt() has the basej argument which is the jiffies
value snapshot taken in the calling code. So we can just that.

Thanks to Ashton for bisecting and providing trace data!

Fixes: a683f390b9 ("timers: Forward the wheel clock whenever possible")
Reported-by: Ashton Holmes <scoopta@gmail.com>
Reported-by: Michael Thayer <michael.thayer@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michal Necasek <michal.necasek@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: knut.osmundsen@oracle.com
Cc: stable@vger.kernel.org
Cc: stern@rowland.harvard.edu
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161022110552.175308322@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-25 16:32:50 +02:00
..
alarmtimer.c alarmtimer: Remove unused but set variable 2016-10-17 11:59:37 +02:00
clockevents.c clockevents: Make clockevents_subsys static 2016-07-19 10:48:06 +02:00
clocksource.c clocksource: Defer override invalidation unless clock is unstable 2016-08-31 14:43:33 -07:00
hrtimer.c time: Avoid undefined behaviour in ktime_add_safe() 2016-08-31 14:43:36 -07:00
itimer.c itimers: Handle relative timers with CONFIG_TIME_LOW_RES proper 2016-01-17 11:13:55 +01:00
jiffies.c jiffies: Use CLOCKSOURCE_MASK instead of constant 2016-02-27 08:55:31 +01:00
Kconfig rcu: Drop RCU_USER_QS in favor of NO_HZ_FULL 2015-07-06 13:52:18 -07:00
Makefile time: Remove development rules from Kbuild/Makefile 2015-07-01 09:57:35 +02:00
ntp_internal.h ntp: Fix second_overflow's input parameter type to be 64bits 2015-12-16 16:50:56 -08:00
ntp.c ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO 2016-01-22 12:01:42 +01:00
posix-clock.c posix-clock: Fix return code on the poll method's error path 2015-12-29 11:33:06 +01:00
posix-cpu-timers.c posix_cpu_timer: Exit early when process has been reaped 2016-07-11 17:20:12 +02:00
posix-timers.c posix-timers: Handle relative timers with CONFIG_TIME_LOW_RES proper 2016-01-17 11:13:55 +01:00
sched_clock.c timers, sched/clock: Clean up the code a bit 2015-03-27 08:34:01 +01:00
test_udelay.c time: Avoid timespec in udelay_test 2016-06-20 12:47:26 -07:00
tick-broadcast-hrtimer.c tick/broadcast-hrtimer: Set name of the ce_broadcast_hrtimer 2016-07-05 17:02:19 +02:00
tick-broadcast.c tick: Move the export of tick_broadcast_oneshot_control to the proper place 2015-07-14 12:01:04 +02:00
tick-common.c clockevents: Remove unused set_mode() callback 2015-09-14 11:00:55 +02:00
tick-internal.h timers: Forward the wheel clock whenever possible 2016-07-07 10:35:11 +02:00
tick-oneshot.c clockevents: Provide functions to set and get the state 2015-06-02 14:40:47 +02:00
tick-sched.c tick/nohz: Prevent stopping the tick on an offline CPU 2016-09-13 17:53:52 +02:00
tick-sched.h timers/nohz: Convert tick dependency mask to atomic_t 2016-03-29 11:52:11 +02:00
time.c time: Avoid undefined behaviour in timespec64_add_safe() 2016-08-31 14:43:35 -07:00
timeconst.bc timeconst: Update path in comment 2015-10-26 10:06:06 +09:00
timeconv.c time: Add time64_to_tm() 2016-06-20 12:47:15 -07:00
timecounter.c
timekeeping_debug.c timekeeping: Prints the amounts of time spent during suspend 2016-08-31 14:43:34 -07:00
timekeeping_internal.h clocksource: Make clocksource validation work for all clocksources 2015-12-19 15:59:57 +01:00
timekeeping.c timekeeping: Fix __ktime_get_fast_ns() regression 2016-10-05 15:44:46 +02:00
timekeeping.h hrtimer: Make offset update smarter 2015-04-22 17:06:49 +02:00
timer_list.c hrtimer: Handle remaining time proper for TIME_LOW_RES 2016-01-17 11:13:55 +01:00
timer_stats.c timer: Avoid using timespec 2016-06-20 12:47:33 -07:00
timer.c timers: Prevent base clock rewind when forwarding clock 2016-10-25 16:32:50 +02:00