mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-12-26 23:35:16 +07:00
3689c37d23
Prior to this change only two types of ATSDs were issued to the NPU: invalidates targeting a single page and invalidates targeting the whole address space. The crossover point happened at the configurable atsd_threshold which defaulted to 2M. Invalidates that size or smaller would issue per-page invalidates for the whole range. The NPU supports more invalidation sizes however: 64K, 2M, 1G, and all. These invalidates target addresses aligned to their size. 2M is a common invalidation size for GPU-enabled applications because that is a GPU page size, so reducing the number of invalidates by 32x in that case is a clear improvement. ATSD latency is high in general so now we always issue a single invalidate rather than multiple. This will over-invalidate in some cases, but for any invalidation size over 2M it matches or improves the prior behavior. There's also an improvement for single-page invalidates since the prior version issued two invalidates for that case instead of one. With this change all issued ATSDs now perform a flush, so the flush parameter has been removed from all the helpers. To show the benefit here are some performance numbers from a microbenchmark which creates a 1G allocation then uses mprotect with PROT_NONE to trigger invalidates in strides across the allocation. One NPU (1 GPU): mprotect rate (GB/s) Stride Before After Speedup 64K 5.3 5.6 5% 1M 39.3 57.4 46% 2M 49.7 82.6 66% 4M 286.6 285.7 0% Two NPUs (6 GPUs): mprotect rate (GB/s) Stride Before After Speedup 64K 6.5 7.4 13% 1M 33.4 67.9 103% 2M 38.7 93.1 141% 4M 356.7 354.6 -1% Anything over 2M is roughly the same as before since both cases issue a single ATSD. Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com> Reviewed-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> |
||
---|---|---|
.. | ||
copy-paste.h | ||
eeh-powernv.c | ||
idle.c | ||
Kconfig | ||
Makefile | ||
memtrace.c | ||
npu-dma.c | ||
ocxl.c | ||
opal-async.c | ||
opal-dump.c | ||
opal-elog.c | ||
opal-flash.c | ||
opal-hmi.c | ||
opal-imc.c | ||
opal-irqchip.c | ||
opal-kmsg.c | ||
opal-lpc.c | ||
opal-memory-errors.c | ||
opal-msglog.c | ||
opal-nvram.c | ||
opal-power.c | ||
opal-powercap.c | ||
opal-prd.c | ||
opal-psr.c | ||
opal-rtc.c | ||
opal-sensor-groups.c | ||
opal-sensor.c | ||
opal-sysparam.c | ||
opal-tracepoints.c | ||
opal-wrappers.S | ||
opal-xscom.c | ||
opal.c | ||
pci-cxl.c | ||
pci-ioda-tce.c | ||
pci-ioda.c | ||
pci.c | ||
pci.h | ||
powernv.h | ||
rng.c | ||
setup.c | ||
smp.c | ||
subcore-asm.S | ||
subcore.c | ||
subcore.h | ||
vas-debug.c | ||
vas-trace.h | ||
vas-window.c | ||
vas.c | ||
vas.h |