linux_dsm_epyc7002/Documentation
Samuel Holland c950ca8c35 clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability
The Allwinner A64 SoC is known[1] to have an unstable architectural
timer, which manifests itself most obviously in the time jumping forward
a multiple of 95 years[2][3]. This coincides with 2^56 cycles at a
timer frequency of 24 MHz, implying that the time went slightly backward
(and this was interpreted by the kernel as it jumping forward and
wrapping around past the epoch).

Investigation revealed instability in the low bits of CNTVCT at the
point a high bit rolls over. This leads to power-of-two cycle forward
and backward jumps. (Testing shows that forward jumps are about twice as
likely as backward jumps.) Since the counter value returns to normal
after an indeterminate read, each "jump" really consists of both a
forward and backward jump from the software perspective.

Unless the kernel is trapping CNTVCT reads, a userspace program is able
to read the register in a loop faster than it changes. A test program
running on all 4 CPU cores that reported jumps larger than 100 ms was
run for 13.6 hours and reported the following:

 Count | Event
-------+---------------------------
  9940 | jumped backward      699ms
   268 | jumped backward     1398ms
     1 | jumped backward     2097ms
 16020 | jumped forward       175ms
  6443 | jumped forward       699ms
  2976 | jumped forward      1398ms
     9 | jumped forward    356516ms
     9 | jumped forward    357215ms
     4 | jumped forward    714430ms
     1 | jumped forward   3578440ms

This works out to a jump larger than 100 ms about every 5.5 seconds on
each CPU core.

The largest jump (almost an hour!) was the following sequence of reads:
    0x0000007fffffffff → 0x00000093feffffff → 0x0000008000000000

Note that the middle bits don't necessarily all read as all zeroes or
all ones during the anomalous behavior; however the low 10 bits checked
by the function in this patch have never been observed with any other
value.

Also note that smaller jumps are much more common, with backward jumps
of 2048 (2^11) cycles observed over 400 times per second on each core.
(Of course, this is partially explained by lower bits rolling over more
frequently.) Any one of these could have caused the 95 year time skip.

Similar anomalies were observed while reading CNTPCT (after patching the
kernel to allow reads from userspace). However, the CNTPCT jumps are
much less frequent, and only small jumps were observed. The same program
as before (except now reading CNTPCT) observed after 72 hours:

 Count | Event
-------+---------------------------
    17 | jumped backward      699ms
    52 | jumped forward       175ms
  2831 | jumped forward       699ms
     5 | jumped forward      1398ms

Further investigation showed that the instability in CNTPCT/CNTVCT also
affected the respective timer's TVAL register. The following values were
observed immediately after writing CNVT_TVAL to 0x10000000:

 CNTVCT             | CNTV_TVAL  | CNTV_CVAL          | CNTV_TVAL Error
--------------------+------------+--------------------+-----------------
 0x000000d4a2d8bfff | 0x10003fff | 0x000000d4b2d8bfff | +0x00004000
 0x000000d4a2d94000 | 0x0fffffff | 0x000000d4b2d97fff | -0x00004000
 0x000000d4a2d97fff | 0x10003fff | 0x000000d4b2d97fff | +0x00004000
 0x000000d4a2d9c000 | 0x0fffffff | 0x000000d4b2d9ffff | -0x00004000

The pattern of errors in CNTV_TVAL seemed to depend on exactly which
value was written to it. For example, after writing 0x10101010:

 CNTVCT             | CNTV_TVAL  | CNTV_CVAL          | CNTV_TVAL Error
--------------------+------------+--------------------+-----------------
 0x000001ac3effffff | 0x1110100f | 0x000001ac4f10100f | +0x1000000
 0x000001ac40000000 | 0x1010100f | 0x000001ac5110100f | -0x1000000
 0x000001ac58ffffff | 0x1110100f | 0x000001ac6910100f | +0x1000000
 0x000001ac66000000 | 0x1010100f | 0x000001ac7710100f | -0x1000000
 0x000001ac6affffff | 0x1110100f | 0x000001ac7b10100f | +0x1000000
 0x000001ac6e000000 | 0x1010100f | 0x000001ac7f10100f | -0x1000000

I was also twice able to reproduce the issue covered by Allwinner's
workaround[4], that writing to TVAL sometimes fails, and both CVAL and
TVAL are left with entirely bogus values. One was the following values:

 CNTVCT             | CNTV_TVAL  | CNTV_CVAL
--------------------+------------+--------------------------------------
 0x000000d4a2d6014c | 0x8fbd5721 | 0x000000d132935fff (615s in the past)
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

========================================================================

Because the CPU can read the CNTPCT/CNTVCT registers faster than they
change, performing two reads of the register and comparing the high bits
(like other workarounds) is not a workable solution. And because the
timer can jump both forward and backward, no pair of reads can
distinguish a good value from a bad one. The only way to guarantee a
good value from consecutive reads would be to read _three_ times, and
take the middle value only if the three values are 1) each unique and
2) increasing. This takes at minimum 3 counter cycles (125 ns), or more
if an anomaly is detected.

However, since there is a distinct pattern to the bad values, we can
optimize the common case (1022/1024 of the time) to a single read by
simply ignoring values that match the error pattern. This still takes no
more than 3 cycles in the worst case, and requires much less code. As an
additional safety check, we still limit the loop iteration to the number
of max-frequency (1.2 GHz) CPU cycles in three 24 MHz counter periods.

For the TVAL registers, the simple solution is to not use them. Instead,
read or write the CVAL and calculate the TVAL value in software.

Although the manufacturer is aware of at least part of the erratum[4],
there is no official name for it. For now, use the kernel-internal name
"UNKNOWN1".

[1]: https://github.com/armbian/build/commit/a08cd6fe7ae9
[2]: https://forum.armbian.com/topic/3458-a64-datetime-clock-issue/
[3]: https://irclog.whitequark.org/linux-sunxi/2018-01-26
[4]: https://github.com/Allwinner-Homlet/H6-BSP4.9-linux/blob/master/drivers/clocksource/arm_arch_timer.c#L272

Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-02-23 12:13:45 +01:00
..
ABI for-linus-20190112 2019-01-12 13:40:51 -08:00
accelerators
accounting psi: cgroup support 2018-10-26 16:26:32 -07:00
acpi
admin-guide A handful of late-arriving documentation fixes. 2019-01-05 18:35:02 -08:00
aoe
arm Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
arm64 clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability 2019-02-23 12:13:45 +01:00
auxdisplay
backlight
block block: doc: add slice_idle_us to bfq documentation 2019-01-09 07:38:48 -07:00
blockdev zram: idle writeback fixes and cleanup 2019-01-08 17:15:10 -08:00
bpf
bus-devices
cdrom Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
cgroup-v1 Merge branch 'for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2018-10-25 17:15:46 -07:00
cma
connector
console
core-api A handful of late-arriving documentation fixes. 2019-01-05 18:35:02 -08:00
cpu-freq Documentation: cpu-freq: Frequencies aren't always sorted 2018-11-07 13:29:04 +01:00
cpuidle Documentation: admin-guide: PM: Add cpuidle document 2018-12-03 10:03:36 +01:00
crypto crypto: skcipher - remove remnants of internal IV generators 2018-12-23 11:52:45 +08:00
dev-tools A fairly normal cycle for documentation stuff. We have a new 2018-12-29 11:21:49 -08:00
device-mapper Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
devicetree dt-bindings: reset: uniphier: Add AHCI core reset description 2019-01-07 16:38:51 +01:00
doc-guide doc:process: add links where missing 2018-12-06 10:21:19 -07:00
driver-api pci-v4.21-changes 2019-01-05 17:57:34 -08:00
driver-model Documentation: driver core: remove use of BUS_ATTR 2019-01-08 15:17:45 +01:00
early-userspace Correct gen_init_cpio tool's documentation 2018-11-25 12:25:53 -07:00
EDID Docs/EDID: Calculate CRC while building the code 2018-11-06 07:36:22 -07:00
extcon
fault-injection
fb This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
features Documentation/features: Add csky kernel features 2019-01-07 22:22:16 +08:00
filesystems Documentation: driver core: remove use of BUS_ATTR 2019-01-08 15:17:45 +01:00
firmware_class
fmc Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
fpga
gpio Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
gpu A fairly normal cycle for documentation stuff. We have a new 2018-12-29 11:21:49 -08:00
hid HID: doc: fix wrong data structure reference for UHID_OUTPUT 2018-12-18 14:55:22 +01:00
hwmon hwmon: Introduce SENSOR_DEVICE_ATTR_{RO, RW, WO} and variants 2018-12-16 15:13:22 -08:00
i2c i2c: add i2c bus driver for NVIDIA GPU 2018-11-09 17:46:43 +01:00
ia64
ide Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
iio
infiniband
input Input: add REL_WHEEL_HI_RES and REL_HWHEEL_HI_RES 2018-12-07 16:27:11 +01:00
ioctl seccomp: add a return code to trap to userspace 2018-12-11 16:28:41 -08:00
isdn Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
kbuild kbuild: generate asm-generic wrappers if mandatory headers are missing 2019-01-06 09:46:51 +09:00
kdump
kernel-hacking
laptops platform-drivers-x86 for v4.20-1 2018-11-01 08:42:21 -07:00
leds Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
lightnvm
livepatch
locking This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
m68k Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
maintainer
md
media A fairly normal cycle for documentation stuff. We have a new 2018-12-29 11:21:49 -08:00
memory-devices
mic
mips Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
misc-devices
mmc Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
mtd Documentation: mtd: remove stale pxa3xx NAND controller documentation 2018-09-04 23:37:38 +02:00
namespaces
netlabel Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
networking Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-03 12:53:47 -08:00
nfc
nios2
nvdimm libnvdimm/security: Add documentation for nvdimm security support 2018-12-21 12:44:41 -08:00
nvmem Documentation: nvmem: document cell tables and lookup entries 2018-09-28 15:14:54 +02:00
openrisc
parisc Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
PCI pci-v4.20-changes 2018-10-25 06:50:48 -07:00
pcmcia
perf Documentation: perf: Add documentation for ThunderX2 PMU uncore driver 2018-12-06 12:29:47 +00:00
phy
platform
power Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
powerpc powerpc/fadump: Reservationless firmware assisted dump 2018-12-21 11:32:49 +11:00
pps
process docs: fix Co-Developed-by docs 2019-01-04 13:13:48 -08:00
pti
ptp
rapidio
RCU doc: Fix "struction" typo in RCU memory-ordering documentation 2018-11-12 08:56:25 -08:00
riscv
s390 Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
scheduler This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
scsi SCSI misc on 20181224 2018-12-28 14:48:06 -08:00
security Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-01-02 09:43:14 -08:00
serial Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
sh sh: remove board_time_init() callback 2018-12-18 16:13:04 +01:00
sound Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
sparc
sphinx
sphinx-static docs: improve readability for people with poorer eyesight 2018-10-07 09:16:50 -06:00
spi Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
sysctl kernel/sysctl: add panic_print into sysctl 2019-01-04 13:13:47 -08:00
target
thermal Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
timers Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
trace Merge branches 'pm-cpuidle', 'pm-cpufreq' and 'pm-sleep' 2019-01-11 10:09:51 +01:00
translations doc🇮🇹 add some process/* translations 2018-12-06 10:11:40 -07:00
usb Documentation/usb: Fix typo 2018-11-26 16:56:34 +01:00
userspace-api Merge branch 'next-seccomp' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-01-02 09:48:13 -08:00
virtual Documentation/virtual/kvm: Update URL for AMD SEV API specification 2019-01-11 18:38:07 +01:00
vm A fairly normal cycle for documentation stuff. We have a new 2018-12-29 11:21:49 -08:00
w1 Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
watchdog watchdog: docs: kernel-api: don't reference removed functions 2018-12-24 13:15:06 +01:00
wimax
x86 x86/cache: Rename config option to CONFIG_X86_RESCTRL 2019-01-09 10:29:03 +01:00
xilinx Documentation: xilinx: Add documentation for eemi APIs 2018-10-09 13:26:05 +02:00
xtensa
.gitignore
atomic_bitops.txt
atomic_t.txt
bt8xxgpio.txt
btmrvl.txt
bus-virt-phys-mapping.txt
Changes
clearing-warn-once.txt
CodingStyle
conf.py This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt
dell_rbu.txt
digsig.txt
DMA-API-HOWTO.txt
DMA-API.txt dma-mapping: deprecate dma_zalloc_coherent 2018-12-20 08:14:09 +01:00
DMA-attributes.txt
DMA-ISA-LPC.txt
docutils.conf
dontdiff
efi-stub.txt efi_stub: update documentation on dtb= parameter 2018-09-09 14:46:44 -06:00
eisa.txt
flexible-arrays.txt
futex-requeue-pi.txt
gcc-plugins.txt
highuid.txt
hw_random.txt
hwspinlock.txt
index.rst docs: tidy up TOCs and refs to license-rules.rst 2018-08-31 16:50:50 -06:00
intel_txt.txt
Intel-IOMMU.txt
io_ordering.txt
io-mapping.txt
iostats.txt
IPMI.txt
IRQ-affinity.txt
IRQ-domain.txt
IRQ.txt
irqflags-tracing.txt
isa.txt
isapnp.txt
kernel-per-CPU-kthreads.txt doc: Update removal of RCU-bh/sched update machinery 2018-08-30 10:59:48 -07:00
kobject.txt kref/kobject: Improve documentation 2018-12-06 13:57:03 +01:00
kprobes.txt
kref.txt
ldm.txt
lockup-watchdogs.txt
logo.gif
logo.txt
lsm.txt
lzo.txt
mailbox.txt
Makefile kbuild: Add support for DT binding schema checks 2018-12-13 09:41:32 -06:00
memory-barriers.txt Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
men-chameleon-bus.txt
nommu-mmap.txt
ntb.txt
numastat.txt
padata.txt
parport-lowlevel.txt
percpu-rw-semaphore.txt
phy.txt
pi-futex.txt
pnp.txt
preempt-locking.txt Documentation: preempt-locking: Use better example 2018-10-12 11:35:47 -06:00
pwm.txt
rbtree.txt
remoteproc.txt
rfkill.txt
robust-futex-ABI.txt
robust-futexes.txt
rpmsg.txt
rtc.txt
SAK.txt
sgi-ioc4.txt
siphash.txt
SM501.txt
smsc_ece1099.txt
speculation.txt
static-keys.txt Documentation: Use "while" instead of "whilst" 2018-11-20 09:30:43 -07:00
SubmittingPatches
svga.txt
switchtec.txt NTB: switchtec_ntb: Update switchtec documentation with prerequisites for NTB 2018-10-11 11:28:53 -05:00
sync_file.txt
tee.txt
this_cpu_ops.txt
unaligned-memory-access.txt
vfio-mediated-device.txt
vfio.txt
video-output.txt
xillybus.txt
xz.txt
zorro.txt