linux_dsm_epyc7002/arch/arc/include/asm
Vineet Gupta 8d56bec2f2 ARC: [mm] optimize needless full mm TLB flush on munmap
munmap ends up calling tlb_flush() which for ARC was flushing the entire
TLB unconditionally (by moving the MMU to a new ASID)

do_munmap
  unmap_region
    unmap_vmas
      unmap_single_vma
         unmap_page_range
            tlb_start_vma
            zap_pud_range
            tlb_end_vma()
  tlb_finish_mmu
    tlb_flush()  ---> unconditional flush_tlb_mm()

So even a single page munmap, a frequent operation when uClibc dynamic
linker (ldso) is loading the dependent shared libraries, would move the
the ASID multiple times - needlessly invalidating the pre-faulted TLB
entries (and increasing the rate of ASID wraparound + full TLB flush).

This is now optimised to only be called if tlb->full_mm (which means
for exit/execve) cases only. And for those cases, flush_tlb_mm() is
already optimised to be a no-op for mm->mm_users == 0.

So essentially there are no mmore full mm flushes - except for fork which
anyhow needs it for properly COW'ing parent address space.

munmap now needs to do TLB range flush, which is implemented with
tlb_end_vma()

Results
-------
1. ASID now consistenly moves by 4 during a simple ls (as opposed to 5 or
   7 before).

2. LMBench microbenchmark also shows improvements

Basic system parameters
------------------------------------------------------------------------------
Host                 OS Description              Mhz  tlb  cache  mem scal
                                                     pages line   par load
                                                           bytes
--------- ------------- ----------------------- ---- ----- ----- ------ ----
3.9-rc5-0 Linux 3.9.0-r 3.9-rc5-0404-gcc-4.4-ba   80     8    64 1.1000 1
3.9-rc5-0 Linux 3.9.0-r 3.9-rc5-0405-avoid-full   80     8    64 1.1200 1

Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                             call  I/O stat clos TCP  inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
3.9-rc5-0 Linux 3.9.0-r   80 4.81 8.69 68.6 118. 239. 8.53 31.6 4839 13.K 34.K
3.9-rc5-0 Linux 3.9.0-r   80 4.46 8.36 53.8 91.3 223. 8.12 24.2 4725 13.K 33.K

File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host                 OS   0K File      10K File     Mmap    Prot   Page 100fd
                        Create Delete Create Delete Latency Fault  Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
3.9-rc5-0 Linux 3.9.0-r  314.7  223.2 1054.9  390.2  3615.0 1.590 20.1 126.6
3.9-rc5-0 Linux 3.9.0-r  265.8  183.8 1014.2  314.1  3193.0 6.910 18.8 110.4

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07 13:44:00 +05:30
..
arcregs.h ARC: Boot #2: Verbose Boot reporting / feature verification 2013-02-15 23:16:07 +05:30
asm-offsets.h ARC: Generic Headers 2013-01-28 12:34:21 +05:30
atomic.h ARC: Atomic/bitops/cmpxchg/barriers 2013-02-11 20:00:30 +05:30
barrier.h ARC: Atomic/bitops/cmpxchg/barriers 2013-02-11 20:00:30 +05:30
bitops.h ARC: Atomic/bitops/cmpxchg/barriers 2013-02-11 20:00:30 +05:30
bug.h ARC: Fundamental ARCH data-types/defines 2013-02-11 20:00:34 +05:30
cache.h ARC: Cache Flush Management 2013-02-15 23:15:50 +05:30
cacheflush.h ARC: Cache Flush Management 2013-02-15 23:15:50 +05:30
checksum.h ARC: Checksum/byteorder/swab routines 2013-02-11 20:00:34 +05:30
clk.h ARC: [DeviceTree] Convert some Kconfig items to runtime values 2013-02-15 23:15:56 +05:30
cmpxchg.h ARC: Atomic/bitops/cmpxchg/barriers 2013-02-11 20:00:30 +05:30
current.h ARC: [optim] Cache "current" in Register r25 2013-02-15 23:15:58 +05:30
defines.h ARC: Boot #2: Verbose Boot reporting / feature verification 2013-02-15 23:16:07 +05:30
delay.h ARC: Timers/counters/delay management 2013-02-11 20:00:39 +05:30
disasm.h ARC: disassembly (needed by kprobes/kgdb/unaligned-access-emul) 2013-02-15 23:16:04 +05:30
dma-mapping.h arc: fix dma_address assignment during dma_map_sg() 2013-03-19 15:34:53 +05:30
dma.h ARC: I/O and DMA Mappings 2013-02-15 23:15:54 +05:30
elf.h ARC: Remove SET_PERSONALITY (tracks cross-arch change) 2013-03-18 14:37:05 +05:30
entry.h ARC: Fix the typo in event identifier flags used by ptrace 2013-03-20 18:45:45 +05:30
exec.h ARC: Fundamental ARCH data-types/defines 2013-02-11 20:00:34 +05:30
futex.h ARC: Futex support 2013-02-15 23:16:00 +05:30
io.h ARC: Add support for ioremap_prot API 2013-02-15 23:16:11 +05:30
irq.h ARC: Prepare interrupt code for external controllers 2013-05-07 13:43:58 +05:30
irqflags.h ARC: Add implicit compiler barrier to raw_local_irq* functions 2013-04-08 16:10:26 -07:00
Kbuild ARC: UAPI Disintegrate arch/arc/include/asm 2013-02-15 23:16:11 +05:30
kdebug.h ARC: Fundamental ARCH data-types/defines 2013-02-11 20:00:34 +05:30
kgdb.h ARC: make allyesconfig build breakages 2013-03-11 19:01:09 +05:30
kprobes.h ARC: kprobes support 2013-02-15 23:16:05 +05:30
linkage.h ARC: Support for single cycle Close Coupled Mem (CCM) 2013-02-15 23:16:10 +05:30
mach_desc.h ARC: make a copy of flat DT 2013-02-26 14:25:18 +05:30
mmu_context.h ARC: SMP support 2013-02-15 23:16:02 +05:30
mmu.h ARC: MMU Context Management 2013-02-15 23:15:51 +05:30
module.h ARC: DWARF2 .debug_frame based stack unwinder 2013-02-15 23:16:03 +05:30
mutex.h ARC: SMP support 2013-02-15 23:16:02 +05:30
page.h ARC: Add support for ioremap_prot API 2013-02-15 23:16:11 +05:30
perf_event.h ARC: perf support (software counters only) 2013-02-15 23:16:09 +05:30
pgalloc.h ARC: Page Table Management 2013-02-15 23:15:51 +05:30
pgtable.h ARC: SMP support 2013-02-15 23:16:02 +05:30
processor.h ARC: SMP support 2013-02-15 23:16:02 +05:30
prom.h ARC: [Review] Multi-platform image #2: Board callback Infrastructure 2013-02-15 23:16:13 +05:30
ptrace.h ARC: Fix the typo in event identifier flags used by ptrace 2013-03-20 18:45:45 +05:30
sections.h ARC: [DeviceTree] Basic support 2013-02-15 23:15:55 +05:30
segment.h ARC: uaccess friends 2013-02-11 20:00:31 +05:30
serial.h ARC: [TB10x] Add support for TB10x platform 2013-05-07 13:43:59 +05:30
setup.h ARC: UAPI Disintegrate arch/arc/include/asm 2013-02-15 23:16:11 +05:30
smp.h ARC: [Review] Multi-platform image #7: SMP common code to use callbacks 2013-02-15 23:16:16 +05:30
spinlock_types.h ARC: Spinlock/rwlock/mutex primitives 2013-02-11 20:00:35 +05:30
spinlock.h ARC: Spinlock/rwlock/mutex primitives 2013-02-11 20:00:35 +05:30
string.h ARC: String library 2013-02-11 20:00:35 +05:30
switch_to.h ARC: Process-creation/scheduling/idle-loop 2013-02-11 20:00:38 +05:30
syscall.h ARC: Syscall support (no-legacy-syscall ABI) 2013-02-11 20:00:38 +05:30
syscalls.h ARC: ABIv3: fork/vfork wrappers not needed in "no-legacy-syscall" ABI 2013-03-11 19:01:10 +05:30
thread_info.h ARC: Fundamental ARCH data-types/defines 2013-02-11 20:00:34 +05:30
timex.h ARC: Timers/counters/delay management 2013-02-11 20:00:39 +05:30
tlb-mmu1.h ARC: MMU Exception Handling 2013-02-15 23:15:52 +05:30
tlb.h ARC: [mm] optimize needless full mm TLB flush on munmap 2013-05-07 13:44:00 +05:30
tlbflush.h ARC: TLB flush Handling 2013-02-15 23:15:53 +05:30
uaccess.h ARC: [optim] uaccess __{get,put}_user() optimised 2013-02-11 20:00:32 +05:30
unaligned.h ARC: Unaligned access emulation 2013-02-15 23:16:06 +05:30
unwind.h ARC: DWARF2 .debug_frame based stack unwinder 2013-02-15 23:16:03 +05:30