Commit Graph

3 Commits

Author SHA1 Message Date
Benjamin Herrenschmidt
9e1b32caa5 mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()
mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()

Upcoming paches to support the new 64-bit "BookE" powerpc architecture
will need to have the virtual address corresponding to PTE page when
freeing it, due to the way the HW table walker works.

Basically, the TLB can be loaded with "large" pages that cover the whole
virtual space (well, sort-of, half of it actually) represented by a PTE
page, and which contain an "indirect" bit indicating that this TLB entry
RPN points to an array of PTEs from which the TLB can then create direct
entries. Thus, in order to invalidate those when PTE pages are deleted,
we need the virtual address to pass to tlbilx or tlbivax instructions.

The old trick of sticking it somewhere in the PTE page struct page sucks
too much, the address is almost readily available in all call sites and
almost everybody implemets these as macros, so we may as well add the
argument everywhere. I added it to the pmd and pud variants for consistency.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Howells <dhowells@redhat.com> [MN10300 & FRV]
Acked-by: Nick Piggin <npiggin@suse.de>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-27 12:10:38 -07:00
Paul Mundt
c20351846e sh: Flush only the needed range when unmapping a VMA.
This follows the ARM change from Aaro Koskinen:

	When unmapping N pages (e.g. shared memory) the amount of TLB
	flushes done can be (N*PAGE_SIZE/ZAP_BLOCK_SIZE)*N although it
	should be N at maximum. With PREEMPT kernel ZAP_BLOCK_SIZE is 8
	pages, so there is a noticeable performance penalty when
	unmapping a large VMA and the system is spending its time in
	flush_tlb_range().

	The problem is that tlb_end_vma() is always flushing the full VMA
	range. The subrange that needs to be flushed can be calculated by
	tlb_remove_tlb_entry(). This approach was suggested by Hugh
	Dickins, and is also used by other arches.

	The speed increase is roughly 3x for 8M mappings and for larger
	mappings even more.

Bits and peices are taken from the ARM patch as well as the existing
arch/um implementation that is quite similar.

The end result is a significant reduction in both partial and full TLB
flushes initiated through flush_tlb_range().

At the same time, the nommu implementation was broken, had a superfluous
cache flush, and subsequently would have triggered a BUG_ON() if a
code-path had triggered it. Tidy this up for correctness and provide a
nopped-out implementation there.

More background on the initial discussion can be found at:

	http://marc.info/?t=123609820900002&r=1&w=2
	http://marc.info/?t=123660375800003&r=1&w=2

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-03-17 21:19:49 +09:00
Paul Mundt
f15cbe6f1a sh: migrate to arch/sh/include/
This follows the sparc changes a439fe51a1.

Most of the moving about was done with Sam's directions at:

http://marc.info/?l=linux-sh&m=121724823706062&w=2

with subsequent hacking and fixups entirely my fault.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2008-07-29 08:09:44 +09:00