mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2025-01-20 20:47:25 +07:00
678cce4019
The x86_64 implementation of Poly1305 produces the wrong result on some inputs because poly1305_4block_avx2() incorrectly assumes that when partially reducing the accumulator, the bits carried from limb 'd4' to limb 'h0' fit in a 32-bit integer. This is true for poly1305-generic which processes only one block at a time. However, it's not true for the AVX2 implementation, which processes 4 blocks at a time and therefore can produce intermediate limbs about 4x larger. Fix it by making the relevant calculations use 64-bit arithmetic rather than 32-bit. Note that most of the carries already used 64-bit arithmetic, but the d4 -> h0 carry was different for some reason. To be safe I also made the same change to the corresponding SSE2 code, though that only operates on 1 or 2 blocks at a time. I don't think it's really needed for poly1305_block_sse2(), but it doesn't hurt because it's already x86_64 code. It *might* be needed for poly1305_2block_sse2(), but overflows aren't easy to reproduce there. This bug was originally detected by my patches that improve testmgr to fuzz algorithms against their generic implementation. But also add a test vector which reproduces it directly (in the AVX2 case). Fixes: |
||
---|---|---|
.. | ||
boot | ||
configs | ||
crypto | ||
entry | ||
events | ||
hyperv | ||
ia32 | ||
include | ||
kernel | ||
kvm | ||
lib | ||
math-emu | ||
mm | ||
net | ||
oprofile | ||
pci | ||
platform | ||
power | ||
purgatory | ||
ras | ||
realmode | ||
tools | ||
um | ||
video | ||
xen | ||
.gitignore | ||
Kbuild | ||
Kconfig | ||
Kconfig.cpu | ||
Kconfig.debug | ||
Makefile | ||
Makefile_32.cpu | ||
Makefile.um |