ARM64 implementation of ip_fast_csum() do most of the work
in 128 or 64 bit and call csum_fold() to finalize. csum_fold()
itself take a __wsum argument, to insure that this value is
always a 32bit native-order value.
Fix this by adding the sadly needed '__force' to cast the native
'sum' to the type '__wsum'.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
AArch64 is capable of 128-bit memory accesses without alignment
restrictions, which makes it both possible and highly practical to slurp
up a typical 20-byte IP header in just 2 loads. Implement our own
version of ip_fast_checksum() to take advantage of that, resulting in
considerably fewer instructions and memory accesses than the generic
version. We can also get more optimal code generation for csum_fold() by
defining it a slightly different way round from the generic version, so
throw that into the mix too.
Suggested-by: Luke Starrett <luke.starrett@broadcom.com>
Acked-by: Luke Starrett <luke.starrett@broadcom.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>