linux_dsm_epyc7002/arch/arm/mm
Doug Anderson 14d3ae2efe ARM: 8507/1: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc
If we know that TLB efficiency will not be an issue when memory is
accessed then it's not terribly important to allocate big chunks of
memory.  The whole point of allocating the big chunks was that it would
make TLB usage efficient.

As Marek Szyprowski indicated:
    Please note that mapping memory with larger pages significantly
    improves performance, especially when IOMMU has a little TLB
    cache. This can be easily observed when multimedia devices do
    processing of RGB data with 90/270 degree rotation
Image rotation is distinctly an operation that needs to bounce around
through memory, so it makes sense that TLB efficiency is important
there.

Video decoding, on the other hand, is a fairly sequential operation.
During video decoding it's not expected that we'll be jumping all over
memory.  Decoding video is also pretty heavy and the TLB misses aren't a
huge deal.  Presumably most HW video acceleration users of dma-mapping
will not care about huge pages and will set DMA_ATTR_ALLOC_SINGLE_PAGES.

Allocating big chunks of memory is quite expensive, especially if we're
doing it repeadly and memory is full.  In one (out of tree) usage model
it is common that arm_iommu_alloc_attrs() is called 16 times in a row,
each one trying to allocate 4 MB of memory.  This is called whenever the
system encounters a new video, which could easily happen while the
memory system is stressed out.  In fact, on certain social media
websites that auto-play video and have infinite scrolling, it's quite
common to see not just one of these 16x4MB allocations but 2 or 3 right
after another.  Asking the system even to do a small amount of extra
work to give us big chunks in this case is just not a good use of time.

Allocating big chunks of memory is also expensive indirectly.  Even if
we ask the system not to do ANY extra work to allocate _our_ memory,
we're still potentially eating up all big chunks in the system.
Presumably there are other users in the system that aren't quite as
flexible and that actually need these big chunks.  By eating all the big
chunks we're causing extra work for the rest of the system.  We also may
start making other memory allocations fail.  While the system may be
robust to such failures (as is the case with dwc2 USB trying to allocate
buffers for Ethernet data and with WiFi trying to allocate buffers for
WiFi data), it is yet another big performance hit.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-02-11 15:33:38 +00:00
..
abort-ev4.S ARM: entry: provide uaccess assembly macro hooks 2015-08-26 20:27:02 +01:00
abort-ev4t.S ARM: entry: data abort: tail-call the main data abort handler 2011-07-02 10:56:11 +01:00
abort-ev5t.S ARM: entry: provide uaccess assembly macro hooks 2015-08-26 20:27:02 +01:00
abort-ev5tj.S ARM: entry: provide uaccess assembly macro hooks 2015-08-26 20:27:02 +01:00
abort-ev6.S ARM: entry: provide uaccess assembly macro hooks 2015-08-26 20:27:02 +01:00
abort-ev7.S ARM: entry: provide uaccess assembly macro hooks 2015-08-26 20:27:02 +01:00
abort-lv4t.S ARM: entry: provide uaccess assembly macro hooks 2015-08-26 20:27:02 +01:00
abort-macro.S ARM: entry: provide uaccess assembly macro hooks 2015-08-26 20:27:02 +01:00
abort-nommu.S ARM: entry: data abort: tail-call the main data abort handler 2011-07-02 10:56:11 +01:00
alignment.c uaccess: reimplement probe_kernel_address() using probe_kernel_read() 2015-11-05 19:34:48 -08:00
cache-aurora-l2.h ARM: 7547/4: cache-l2x0: add support for Aurora L2 cache ctrl 2012-11-06 19:47:35 +00:00
cache-fa.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
cache-feroceon-l2.c ARM: 8416/1: Feroceon: use of_iomap() to map register base 2015-08-18 14:00:30 +01:00
cache-l2x0.c ARM: 8482/1: l2x0: make it possible to disable outer sync from DT 2015-12-22 12:15:53 +00:00
cache-nop.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
cache-tauros2.c ARM: convert printk(KERN_* to pr_* 2014-11-21 15:24:50 +00:00
cache-tauros3.h ARM: 7922/1: l2x0: add Marvell Tauros3 support 2013-12-29 12:32:47 +00:00
cache-uniphier.c ARM: 8462/1: cache-uniphier: use common API to find the next level cache 2015-12-03 00:03:09 +00:00
cache-v4.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
cache-v4wb.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
cache-v4wt.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
cache-v6.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
cache-v7.S ARM: cache-v7: optimise test for Cortex A9 r0pX devices 2015-04-14 22:26:52 +01:00
cache-xsc3l2.c ARM: move CP15 definitions to separate header file 2012-03-28 18:30:01 +01:00
context.c ARM: 8465/1: mm: keep reserved ASIDs in sync with mm after multiple rollovers 2015-12-02 23:57:54 +00:00
copypage-fa.c arm: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:14 +08:00
copypage-feroceon.c arm: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:14 +08:00
copypage-v4mc.c Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm 2012-03-29 16:53:48 -07:00
copypage-v4wb.c arm: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:14 +08:00
copypage-v4wt.c arm: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:14 +08:00
copypage-v6.c ARM: 8236/1: mm: fix discard_old_kernel_data 2014-12-03 16:00:04 +00:00
copypage-xsc3.c arm: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:14 +08:00
copypage-xscale.c Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm 2012-03-29 16:53:48 -07:00
dma-mapping.c ARM: 8507/1: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc 2016-02-11 15:33:38 +00:00
dma.h ARM: reduce visibility of dmac_* functions 2015-08-01 22:25:04 +01:00
dump.c ARM: 8249/1: mm: dump: don't skip regions 2015-01-07 20:33:33 +00:00
extable.c ARM: 7876/1: clear Thumb-2 IT state on exception handling 2013-11-07 00:15:49 +00:00
fault-armv.c ARM: convert printk(KERN_* to pr_* 2014-11-21 15:24:50 +00:00
fault.c ARM: 8447/1: catch pending imprecise abort on unmask 2015-10-19 17:08:33 +01:00
fault.h ARM: 8447/1: catch pending imprecise abort on unmask 2015-10-19 17:08:33 +01:00
flush.c mm: differentiate page_mapped() from page_mapcount() for compound pages 2016-01-15 17:56:32 -08:00
fsr-2level.c ARM: LPAE: Move the FSR definitions to separate files 2011-12-08 10:30:37 +00:00
fsr-3level.c ARM: mm: Transparent huge page support for LPAE systems. 2013-06-04 16:52:38 +01:00
highmem.c kmap_atomic_to_page() has no users, remove it 2015-11-09 15:11:24 -08:00
hugetlbpage.c mm/hugetlb: reduce arch dependent code about huge_pmd_unshare 2015-06-24 17:49:41 -07:00
idmap.c ARM: make virt_to_idmap() return unsigned long 2016-02-08 15:47:28 +00:00
init.c ARM: 8501/1: mm: flip priority of CONFIG_DEBUG_RODATA 2016-02-08 15:56:45 +00:00
iomap.c arm/PCI: remove arch pci_flags definition 2012-02-23 20:18:56 -07:00
ioremap.c ARM: add support for generic early_ioremap/early_memremap 2015-12-13 19:18:28 +01:00
Kconfig ARM: 8501/1: mm: flip priority of CONFIG_DEBUG_RODATA 2016-02-08 15:56:45 +00:00
l2c-common.c ARM: outer cache: add WARN_ON() to outer_disable() 2014-05-30 00:47:23 +01:00
l2c-l2x0-resume.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
Makefile ARM: uniphier: add outer cache support 2015-10-27 09:20:50 +09:00
mm.h ARM: provide common method to clear bits in CPU control register 2014-06-02 09:20:11 +01:00
mmap.c arm: mm: support ARCH_MMAP_RND_BITS 2016-01-14 16:00:49 -08:00
mmu.c ARM: SoC multiplatform code changes for v4.5 2016-01-20 18:03:56 -08:00
nommu.c ARM: io: convert ioremap*() to functions 2015-07-03 17:06:56 +01:00
pabort-legacy.S
pabort-v6.S
pabort-v7.S
pageattr.c ARM: 8311/1: Don't use is_module_addr in setting page attributes 2015-03-18 10:13:46 +00:00
pgd.c ARM: domains: keep vectors in separate domain 2015-08-21 13:55:53 +01:00
proc-arm7tdmi.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-arm9tdmi.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-arm720.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-arm740.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-arm920.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-arm922.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-arm925.S ARM: 8349/1: arch/arm/mm/proc-arm925.S: remove dead #ifdef block 2015-05-03 23:22:27 +01:00
proc-arm926.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-arm940.S Merge branches 'misc', 'vdso' and 'fixes' into for-next 2015-04-14 22:28:25 +01:00
proc-arm946.S Merge branches 'misc', 'vdso' and 'fixes' into for-next 2015-04-14 22:28:25 +01:00
proc-arm1020.S ARM: 8348/1: remove comments on CPU_ARM1020_CPU_IDLE 2015-05-03 23:22:09 +01:00
proc-arm1020e.S ARM: 8348/1: remove comments on CPU_ARM1020_CPU_IDLE 2015-05-03 23:22:09 +01:00
proc-arm1022.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-arm1026.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-fa526.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-feroceon.S ARM: 8350/1: proc-feroceon: Fix feroceon_proc_info macro 2015-05-03 23:23:09 +01:00
proc-macros.S Merge branches 'misc', 'vdso' and 'fixes' into for-next 2015-04-14 22:28:25 +01:00
proc-mohawk.S ARM: mohawk: allow building with MMU disabled 2015-12-01 21:44:25 +01:00
proc-sa110.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-sa1100.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-syms.c ARM: modules: don't export cpu_set_pte_ext when !MMU 2013-03-26 09:55:34 +00:00
proc-v6.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-v7-2level.S Merge branches 'arnd-fixes', 'clk', 'misc', 'v7' and 'fixes' into for-next 2015-06-12 21:18:08 +01:00
proc-v7-3level.S ARM: redo TTBR setup code for LPAE 2015-06-01 23:48:19 +01:00
proc-v7.S Merge branches 'misc' and 'misc-rc6' into for-linus 2016-01-05 11:07:28 +00:00
proc-v7m.S ARM: 8451/1: v7-M: Set an early stack for __v7m_setup 2015-11-16 18:34:38 +00:00
proc-xsc3.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
proc-xscale.S ARM: 8314/1: replace PROCINFO embedded branch with relative offset 2015-03-28 15:46:14 +00:00
pv-fixup-asm.S ARM: re-implement physical address space switching 2015-06-01 23:46:33 +01:00
tcm.h ARM: 7694/1: ARM, TCM: initialize TCM in paging_init(), instead of setup_arch() 2013-04-17 16:53:24 +01:00
tlb-fa.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
tlb-v4.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
tlb-v4wb.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
tlb-v4wbi.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
tlb-v6.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00
tlb-v7.S ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ 2014-07-18 12:29:04 +01:00