linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-05 01:36:47 +07:00

History

Doug Anderson 33298ef6d8 ARM: 8505/1: dma-mapping: Optimize allocation The __iommu_alloc_buffer() is expected to be called to allocate pretty sizeable buffers. Upon simple tests of video I saw it trying to allocate 4,194,304 bytes. The function tries to allocate large chunks in order to optimize IOMMU TLB usage. The current function is very, very slow. One problem is the way it keeps trying and trying to allocate big chunks. Imagine a very fragmented memory that has 4M free but no contiguous pages at all. Further imagine allocating 4M (1024 pages). We'll do the following memory allocations: - For page 1: - Try to allocate order 10 (no retry) - Try to allocate order 9 (no retry) - ... - Try to allocate order 0 (with retry, but not needed) - For page 2: - Try to allocate order 9 (no retry) - Try to allocate order 8 (no retry) - ... - Try to allocate order 0 (with retry, but not needed) - ... - ... Total number of calls to alloc() calls for this case is: sum(int(math.log(i, 2)) + 1 for i in range(1, 1025)) => 9228 The above is obviously worse case, but given how slow alloc can be we really want to try to avoid even somewhat bad cases. I timed the old code with a device under memory pressure and it wasn't hard to see it take more than 120 seconds to allocate 4 megs of memory! (NOTE: testing was done on kernel 3.14, so possibly mainline would behave differently). A second problem is that allocating big chunks under memory pressure when we don't need them is just not a great idea anyway unless we really need them. We can make due pretty well with smaller chunks so it's probably wise to leave bigger chunks for other users once memory pressure is on. Let's adjust the allocation like this: 1. If a big chunk fails, stop trying to hard and bump down to lower order allocations. 2. Don't try useless orders. The whole point of big chunks is to optimize the TLB and it can really only make use of 2M, 1M, 64K and 4K sizes. We'll still tend to eat up a bunch of big chunks, but that might be the right answer for some users. A future patch could possibly add a new DMA_ATTR that would let the caller decide that TLB optimization isn't important and that we should use smaller chunks. Presumably this would be a sane strategy for some callers. Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Tomasz Figa <tfiga@chromium.org> Tested-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>		2016-02-11 15:33:37 +00:00
..
alpha	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00
arc	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00
arm	ARM: 8505/1: dma-mapping: Optimize allocation	2016-02-11 15:33:37 +00:00
arm64	ARM: SoC support for Tegra platforms for v4.5	2016-01-22 17:30:52 -08:00
avr32	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00
blackfin	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00
c6x	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00
cris	Merge branch 'akpm' (patches from Andrew)	2016-01-21 12:32:08 -08:00
frv	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00
h8300	Merge branch 'akpm' (patches from Andrew)	2016-01-21 12:32:08 -08:00
hexagon	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00
ia64	[IA64] Enable copy_file_range syscall for ia64	2016-01-22 14:20:01 -08:00
m32r	m32r: fix m32104ut_defconfig build fail	2016-01-14 16:00:49 -08:00
m68k	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00
metag	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00
microblaze	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00
mips	Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus	2016-01-24 12:50:56 -08:00
mn10300	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00
nios2	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00
openrisc	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00
parisc	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00
powerpc	wrappers for ->i_mutex access	2016-01-22 18:04:28 -05:00
s390	wrappers for ->i_mutex access	2016-01-22 18:04:28 -05:00
score
sh	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00
sparc	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00
tile	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00
um	um: kill pfn_t	2016-01-15 17:56:32 -08:00
unicore32	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00
x86	pmem: add wb_cache_pmem() to the PMEM API	2016-01-22 17:02:18 -08:00
xtensa	dma-mapping: remove <asm-generic/dma-coherent.h>	2016-01-20 17:09:18 -08:00
.gitignore
Kconfig	dma-mapping: always provide the dma_map_ops based implementation	2016-01-20 17:09:18 -08:00