2015-11-17 21:45:47 +07:00
|
|
|
Silicon Errata and Software Workarounds
|
|
|
|
=======================================
|
|
|
|
|
|
|
|
Author: Will Deacon <will.deacon@arm.com>
|
|
|
|
Date : 27 November 2015
|
|
|
|
|
|
|
|
It is an unfortunate fact of life that hardware is often produced with
|
|
|
|
so-called "errata", which can cause it to deviate from the architecture
|
|
|
|
under specific circumstances. For hardware produced by ARM, these
|
|
|
|
errata are broadly classified into the following categories:
|
|
|
|
|
|
|
|
Category A: A critical error without a viable workaround.
|
|
|
|
Category B: A significant or critical error with an acceptable
|
|
|
|
workaround.
|
|
|
|
Category C: A minor error that is not expected to occur under normal
|
|
|
|
operation.
|
|
|
|
|
|
|
|
For more information, consult one of the "Software Developers Errata
|
|
|
|
Notice" documents available on infocenter.arm.com (registration
|
|
|
|
required).
|
|
|
|
|
|
|
|
As far as Linux is concerned, Category B errata may require some special
|
|
|
|
treatment in the operating system. For example, avoiding a particular
|
|
|
|
sequence of code, or configuring the processor in a particular way. A
|
|
|
|
less common situation may require similar actions in order to declassify
|
|
|
|
a Category A erratum into a Category C erratum. These are collectively
|
|
|
|
known as "software workarounds" and are only required in the minority of
|
|
|
|
cases (e.g. those cases that both require a non-secure workaround *and*
|
|
|
|
can be triggered by Linux).
|
|
|
|
|
|
|
|
For software workarounds that may adversely impact systems unaffected by
|
|
|
|
the erratum in question, a Kconfig entry is added under "Kernel
|
|
|
|
Features" -> "ARM errata workarounds via the alternatives framework".
|
|
|
|
These are enabled by default and patched in at runtime when an affected
|
|
|
|
CPU is detected. For less-intrusive workarounds, a Kconfig option is not
|
|
|
|
available and the code is structured (preferably with a comment) in such
|
|
|
|
a way that the erratum will not be hit.
|
|
|
|
|
|
|
|
This approach can make it slightly onerous to determine exactly which
|
|
|
|
errata are worked around in an arbitrary kernel source tree, so this
|
|
|
|
file acts as a registry of software workarounds in the Linux Kernel and
|
|
|
|
will be updated when new workarounds are committed and backported to
|
|
|
|
stable kernels.
|
|
|
|
|
2017-02-10 00:00:34 +07:00
|
|
|
| Implementor | Component | Erratum ID | Kconfig |
|
|
|
|
+----------------+-----------------+-----------------+-----------------------------+
|
clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability
The Allwinner A64 SoC is known[1] to have an unstable architectural
timer, which manifests itself most obviously in the time jumping forward
a multiple of 95 years[2][3]. This coincides with 2^56 cycles at a
timer frequency of 24 MHz, implying that the time went slightly backward
(and this was interpreted by the kernel as it jumping forward and
wrapping around past the epoch).
Investigation revealed instability in the low bits of CNTVCT at the
point a high bit rolls over. This leads to power-of-two cycle forward
and backward jumps. (Testing shows that forward jumps are about twice as
likely as backward jumps.) Since the counter value returns to normal
after an indeterminate read, each "jump" really consists of both a
forward and backward jump from the software perspective.
Unless the kernel is trapping CNTVCT reads, a userspace program is able
to read the register in a loop faster than it changes. A test program
running on all 4 CPU cores that reported jumps larger than 100 ms was
run for 13.6 hours and reported the following:
Count | Event
-------+---------------------------
9940 | jumped backward 699ms
268 | jumped backward 1398ms
1 | jumped backward 2097ms
16020 | jumped forward 175ms
6443 | jumped forward 699ms
2976 | jumped forward 1398ms
9 | jumped forward 356516ms
9 | jumped forward 357215ms
4 | jumped forward 714430ms
1 | jumped forward 3578440ms
This works out to a jump larger than 100 ms about every 5.5 seconds on
each CPU core.
The largest jump (almost an hour!) was the following sequence of reads:
0x0000007fffffffff → 0x00000093feffffff → 0x0000008000000000
Note that the middle bits don't necessarily all read as all zeroes or
all ones during the anomalous behavior; however the low 10 bits checked
by the function in this patch have never been observed with any other
value.
Also note that smaller jumps are much more common, with backward jumps
of 2048 (2^11) cycles observed over 400 times per second on each core.
(Of course, this is partially explained by lower bits rolling over more
frequently.) Any one of these could have caused the 95 year time skip.
Similar anomalies were observed while reading CNTPCT (after patching the
kernel to allow reads from userspace). However, the CNTPCT jumps are
much less frequent, and only small jumps were observed. The same program
as before (except now reading CNTPCT) observed after 72 hours:
Count | Event
-------+---------------------------
17 | jumped backward 699ms
52 | jumped forward 175ms
2831 | jumped forward 699ms
5 | jumped forward 1398ms
Further investigation showed that the instability in CNTPCT/CNTVCT also
affected the respective timer's TVAL register. The following values were
observed immediately after writing CNVT_TVAL to 0x10000000:
CNTVCT | CNTV_TVAL | CNTV_CVAL | CNTV_TVAL Error
--------------------+------------+--------------------+-----------------
0x000000d4a2d8bfff | 0x10003fff | 0x000000d4b2d8bfff | +0x00004000
0x000000d4a2d94000 | 0x0fffffff | 0x000000d4b2d97fff | -0x00004000
0x000000d4a2d97fff | 0x10003fff | 0x000000d4b2d97fff | +0x00004000
0x000000d4a2d9c000 | 0x0fffffff | 0x000000d4b2d9ffff | -0x00004000
The pattern of errors in CNTV_TVAL seemed to depend on exactly which
value was written to it. For example, after writing 0x10101010:
CNTVCT | CNTV_TVAL | CNTV_CVAL | CNTV_TVAL Error
--------------------+------------+--------------------+-----------------
0x000001ac3effffff | 0x1110100f | 0x000001ac4f10100f | +0x1000000
0x000001ac40000000 | 0x1010100f | 0x000001ac5110100f | -0x1000000
0x000001ac58ffffff | 0x1110100f | 0x000001ac6910100f | +0x1000000
0x000001ac66000000 | 0x1010100f | 0x000001ac7710100f | -0x1000000
0x000001ac6affffff | 0x1110100f | 0x000001ac7b10100f | +0x1000000
0x000001ac6e000000 | 0x1010100f | 0x000001ac7f10100f | -0x1000000
I was also twice able to reproduce the issue covered by Allwinner's
workaround[4], that writing to TVAL sometimes fails, and both CVAL and
TVAL are left with entirely bogus values. One was the following values:
CNTVCT | CNTV_TVAL | CNTV_CVAL
--------------------+------------+--------------------------------------
0x000000d4a2d6014c | 0x8fbd5721 | 0x000000d132935fff (615s in the past)
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
========================================================================
Because the CPU can read the CNTPCT/CNTVCT registers faster than they
change, performing two reads of the register and comparing the high bits
(like other workarounds) is not a workable solution. And because the
timer can jump both forward and backward, no pair of reads can
distinguish a good value from a bad one. The only way to guarantee a
good value from consecutive reads would be to read _three_ times, and
take the middle value only if the three values are 1) each unique and
2) increasing. This takes at minimum 3 counter cycles (125 ns), or more
if an anomaly is detected.
However, since there is a distinct pattern to the bad values, we can
optimize the common case (1022/1024 of the time) to a single read by
simply ignoring values that match the error pattern. This still takes no
more than 3 cycles in the worst case, and requires much less code. As an
additional safety check, we still limit the loop iteration to the number
of max-frequency (1.2 GHz) CPU cycles in three 24 MHz counter periods.
For the TVAL registers, the simple solution is to not use them. Instead,
read or write the CVAL and calculate the TVAL value in software.
Although the manufacturer is aware of at least part of the erratum[4],
there is no official name for it. For now, use the kernel-internal name
"UNKNOWN1".
[1]: https://github.com/armbian/build/commit/a08cd6fe7ae9
[2]: https://forum.armbian.com/topic/3458-a64-datetime-clock-issue/
[3]: https://irclog.whitequark.org/linux-sunxi/2018-01-26
[4]: https://github.com/Allwinner-Homlet/H6-BSP4.9-linux/blob/master/drivers/clocksource/arm_arch_timer.c#L272
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-13 09:17:18 +07:00
|
|
|
| Allwinner | A64/R18 | UNKNOWN1 | SUN50I_ERRATUM_UNKNOWN1 |
|
|
|
|
| | | | |
|
2017-02-10 00:00:34 +07:00
|
|
|
| ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 |
|
|
|
|
| ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 |
|
|
|
|
| ARM | Cortex-A53 | #824069 | ARM64_ERRATUM_824069 |
|
|
|
|
| ARM | Cortex-A53 | #819472 | ARM64_ERRATUM_819472 |
|
|
|
|
| ARM | Cortex-A53 | #845719 | ARM64_ERRATUM_845719 |
|
|
|
|
| ARM | Cortex-A53 | #843419 | ARM64_ERRATUM_843419 |
|
|
|
|
| ARM | Cortex-A57 | #832075 | ARM64_ERRATUM_832075 |
|
|
|
|
| ARM | Cortex-A57 | #852523 | N/A |
|
|
|
|
| ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 |
|
|
|
|
| ARM | Cortex-A72 | #853709 | N/A |
|
2017-03-21 00:18:06 +07:00
|
|
|
| ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 |
|
2018-03-26 21:12:49 +07:00
|
|
|
| ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 |
|
2019-05-23 17:24:50 +07:00
|
|
|
| ARM | Cortex-A76 | #1188873,1418040| ARM64_ERRATUM_1418040 |
|
2018-12-07 00:31:26 +07:00
|
|
|
| ARM | Cortex-A76 | #1165522 | ARM64_ERRATUM_1165522 |
|
2018-11-19 18:27:28 +07:00
|
|
|
| ARM | Cortex-A76 | #1286807 | ARM64_ERRATUM_1286807 |
|
2019-04-29 19:03:57 +07:00
|
|
|
| ARM | Cortex-A76 | #1463225 | ARM64_ERRATUM_1463225 |
|
2019-05-23 17:24:50 +07:00
|
|
|
| ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 |
|
|
|
|
| ARM | MMU-500 | #841119,826419 | N/A |
|
2017-02-10 00:00:34 +07:00
|
|
|
| | | | |
|
2019-05-23 17:24:50 +07:00
|
|
|
| Cavium | ThunderX ITS | #22375,24313 | CAVIUM_ERRATUM_22375 |
|
2017-02-10 00:00:34 +07:00
|
|
|
| Cavium | ThunderX ITS | #23144 | CAVIUM_ERRATUM_23144 |
|
|
|
|
| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 |
|
|
|
|
| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 |
|
2017-06-09 18:49:48 +07:00
|
|
|
| Cavium | ThunderX Core | #30115 | CAVIUM_ERRATUM_30115 |
|
2017-02-10 00:00:34 +07:00
|
|
|
| Cavium | ThunderX SMMUv2 | #27704 | N/A |
|
2017-06-22 19:05:37 +07:00
|
|
|
| Cavium | ThunderX2 SMMUv3| #74 | N/A |
|
2017-06-23 20:34:36 +07:00
|
|
|
| Cavium | ThunderX2 SMMUv3| #126 | N/A |
|
2017-02-10 00:00:34 +07:00
|
|
|
| | | | |
|
|
|
|
| Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
|
|
|
|
| | | | |
|
|
|
|
| Hisilicon | Hip0{5,6,7} | #161010101 | HISILICON_ERRATUM_161010101 |
|
2017-05-17 16:12:05 +07:00
|
|
|
| Hisilicon | Hip0{6,7} | #161010701 | N/A |
|
2017-07-29 03:20:37 +07:00
|
|
|
| Hisilicon | Hip07 | #161600802 | HISILICON_ERRATUM_161600802 |
|
2019-03-26 22:17:53 +07:00
|
|
|
| Hisilicon | Hip08 SMMU PMCG | #162001800 | N/A |
|
2017-02-10 00:00:34 +07:00
|
|
|
| | | | |
|
2017-12-14 05:19:37 +07:00
|
|
|
| Qualcomm Tech. | Kryo/Falkor v1 | E1003 | QCOM_FALKOR_ERRATUM_1003 |
|
2017-02-10 00:00:34 +07:00
|
|
|
| Qualcomm Tech. | Falkor v1 | E1009 | QCOM_FALKOR_ERRATUM_1009 |
|
2017-03-07 21:20:38 +07:00
|
|
|
| Qualcomm Tech. | QDF2400 ITS | E0065 | QCOM_QDF2400_ERRATUM_0065 |
|
2017-12-12 05:42:32 +07:00
|
|
|
| Qualcomm Tech. | Falkor v{1,2} | E1041 | QCOM_FALKOR_ERRATUM_1041 |
|
2019-02-27 01:43:41 +07:00
|
|
|
| Fujitsu | A64FX | E#010001 | FUJITSU_ERRATUM_010001 |
|