Commit Graph

192 Commits

Author SHA1 Message Date
Rabin Vincent
19e6e5e539 ARM: 8547/1: dma-mapping: store buffer information
Keep a list of allocated DMA buffers so that we can store metadata in
alloc() which we later need in free().

Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-03-04 23:35:17 +00:00
Doug Anderson
14d3ae2efe ARM: 8507/1: dma-mapping: Use DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc
If we know that TLB efficiency will not be an issue when memory is
accessed then it's not terribly important to allocate big chunks of
memory.  The whole point of allocating the big chunks was that it would
make TLB usage efficient.

As Marek Szyprowski indicated:
    Please note that mapping memory with larger pages significantly
    improves performance, especially when IOMMU has a little TLB
    cache. This can be easily observed when multimedia devices do
    processing of RGB data with 90/270 degree rotation
Image rotation is distinctly an operation that needs to bounce around
through memory, so it makes sense that TLB efficiency is important
there.

Video decoding, on the other hand, is a fairly sequential operation.
During video decoding it's not expected that we'll be jumping all over
memory.  Decoding video is also pretty heavy and the TLB misses aren't a
huge deal.  Presumably most HW video acceleration users of dma-mapping
will not care about huge pages and will set DMA_ATTR_ALLOC_SINGLE_PAGES.

Allocating big chunks of memory is quite expensive, especially if we're
doing it repeadly and memory is full.  In one (out of tree) usage model
it is common that arm_iommu_alloc_attrs() is called 16 times in a row,
each one trying to allocate 4 MB of memory.  This is called whenever the
system encounters a new video, which could easily happen while the
memory system is stressed out.  In fact, on certain social media
websites that auto-play video and have infinite scrolling, it's quite
common to see not just one of these 16x4MB allocations but 2 or 3 right
after another.  Asking the system even to do a small amount of extra
work to give us big chunks in this case is just not a good use of time.

Allocating big chunks of memory is also expensive indirectly.  Even if
we ask the system not to do ANY extra work to allocate _our_ memory,
we're still potentially eating up all big chunks in the system.
Presumably there are other users in the system that aren't quite as
flexible and that actually need these big chunks.  By eating all the big
chunks we're causing extra work for the rest of the system.  We also may
start making other memory allocations fail.  While the system may be
robust to such failures (as is the case with dwc2 USB trying to allocate
buffers for Ethernet data and with WiFi trying to allocate buffers for
WiFi data), it is yet another big performance hit.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-02-11 15:33:38 +00:00
Doug Anderson
33298ef6d8 ARM: 8505/1: dma-mapping: Optimize allocation
The __iommu_alloc_buffer() is expected to be called to allocate pretty
sizeable buffers.  Upon simple tests of video I saw it trying to
allocate 4,194,304 bytes.  The function tries to allocate large chunks
in order to optimize IOMMU TLB usage.

The current function is very, very slow.

One problem is the way it keeps trying and trying to allocate big
chunks.  Imagine a very fragmented memory that has 4M free but no
contiguous pages at all.  Further imagine allocating 4M (1024 pages).
We'll do the following memory allocations:
- For page 1:
  - Try to allocate order 10 (no retry)
  - Try to allocate order 9 (no retry)
  - ...
  - Try to allocate order 0 (with retry, but not needed)
- For page 2:
  - Try to allocate order 9 (no retry)
  - Try to allocate order 8 (no retry)
  - ...
  - Try to allocate order 0 (with retry, but not needed)
- ...
- ...

Total number of calls to alloc() calls for this case is:
  sum(int(math.log(i, 2)) + 1 for i in range(1, 1025))
  => 9228

The above is obviously worse case, but given how slow alloc can be we
really want to try to avoid even somewhat bad cases.  I timed the old
code with a device under memory pressure and it wasn't hard to see it
take more than 120 seconds to allocate 4 megs of memory! (NOTE: testing
was done on kernel 3.14, so possibly mainline would behave
differently).

A second problem is that allocating big chunks under memory pressure
when we don't need them is just not a great idea anyway unless we really
need them.  We can make due pretty well with smaller chunks so it's
probably wise to leave bigger chunks for other users once memory
pressure is on.

Let's adjust the allocation like this:

1. If a big chunk fails, stop trying to hard and bump down to lower
   order allocations.
2. Don't try useless orders.  The whole point of big chunks is to
   optimize the TLB and it can really only make use of 2M, 1M, 64K and
   4K sizes.

We'll still tend to eat up a bunch of big chunks, but that might be the
right answer for some users.  A future patch could possibly add a new
DMA_ATTR that would let the caller decide that TLB optimization isn't
important and that we should use smaller chunks.  Presumably this would
be a sane strategy for some callers.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-02-11 15:33:37 +00:00
Tetsuo Handa
1d5cfdb076 tree wide: use kvfree() than conditional kfree()/vfree()
There are many locations that do

  if (memory_was_allocated_by_vmalloc)
    vfree(ptr);
  else
    kfree(ptr);

but kvfree() can handle both kmalloc()ed memory and vmalloc()ed memory
using is_vmalloc_addr().  Unless callers have special reasons, we can
replace this branch with kvfree().  Please check and reply if you found
problems.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Jan Kara <jack@suse.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Boris Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-22 17:02:18 -08:00
Dan Williams
3e6110fd54 Revert "scatterlist: use sg_phys()"
commit db0fa0cb01 "scatterlist: use sg_phys()" did replacements of
the form:

    phys_addr_t phys = page_to_phys(sg_page(s));
    phys_addr_t phys = sg_phys(s) & PAGE_MASK;

However, this breaks platforms where sizeof(phys_addr_t) >
sizeof(unsigned long).  Revert for 4.3 and 4.4 to make room for a
combined helper in 4.5.

Cc: <stable@vger.kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: db0fa0cb01 ("scatterlist: use sg_phys()")
Suggested-by: Joerg Roedel <joro@8bytes.org>
Reported-by: Vitaly Lavrov <vel21ripn@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-12-15 12:54:06 -08:00
Mel Gorman
d0164adc89 mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts.  They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve".  __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available.  Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative.  High priority users continue to use
__GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
  pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
  __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
  into this category where kswapd will still be woken but atomic reserves
  are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
  helper gfpflags_allow_blocking() where possible. This is because
  checking for __GFP_WAIT as was done historically now can trigger false
  positives. Some exceptions like dm-crypt.c exist where the code intent
  is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
  flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
  and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL.  They may
now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Marek Szyprowski
7e31210349 ARM: 8427/1: dma-mapping: add support for offset parameter in dma_mmap()
IOMMU-based dma_mmap() implementation lacked proper support for offset
parameter used in mmap call (it always assumed that mapping starts from
offset zero). This patch adds support for offset parameter to IOMMU-based
implementation.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: stable@vger.kernel.org  # v3.6+
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-10-03 16:36:45 +01:00
Marek Szyprowski
371f0f085f ARM: 8426/1: dma-mapping: add missing range check in dma_mmap()
dma_mmap() function in IOMMU-based dma-mapping implementation lacked
a check for valid range of mmap parameters (offset and buffer size), what
might have caused access beyond the allocated buffer. This patch fixes
this issue.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: stable@vger.kernel.org  # v3.6+
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-10-03 16:36:45 +01:00
Linus Torvalds
99bc7215bc Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
 "Three fixes and a resulting cleanup for -rc2:

   - Andre Przywara reported that he was seeing a warning with the new
     cast inside DMA_ERROR_CODE's definition, and fixed the incorrect
     use.

   - Doug Anderson noticed that kgdb causes a "scheduling while atomic"
     bug.

   - OMAP5 folk noticed that their Thumb-2 compiled X servers crashed
     when enabling support to cover ARMv6 CPUs due to a kernel bug
     leaking some conditional context into the signal handler"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8425/1: kgdb: Don't try to stop the machine when setting breakpoints
  ARM: 8437/1: dma-mapping: fix build warning with new DMA_ERROR_CODE definition
  ARM: get rid of needless #if in signal handling code
  ARM: fix Thumb2 signal handling when ARMv6 is enabled
2015-09-19 21:05:02 -07:00
Andre Przywara
90cde5584a ARM: 8437/1: dma-mapping: fix build warning with new DMA_ERROR_CODE definition
Commit 96231b2686: ("ARM: 8419/1: dma-mapping: harmonize definition
of DMA_ERROR_CODE") changed the definition of DMA_ERROR_CODE to use
dma_addr_t, which makes the compiler barf on assigning this to an
"int" variable on ARM with LPAE enabled:
*************
In file included from /src/linux/include/linux/dma-mapping.h:86:0,
                 from /src/linux/arch/arm/mm/dma-mapping.c:21:
/src/linux/arch/arm/mm/dma-mapping.c: In function '__iommu_create_mapping':
/src/linux/arch/arm/include/asm/dma-mapping.h:16:24: warning:
overflow in implicit constant conversion [-Woverflow]
 #define DMA_ERROR_CODE (~(dma_addr_t)0x0)
                        ^
/src/linux/arch/arm/mm/dma-mapping.c:1252:15: note: in expansion of
macro DMA_ERROR_CODE'
  int i, ret = DMA_ERROR_CODE;
               ^
*************

Remove the actually unneeded initialization of "ret" in
__iommu_create_mapping() and move the variable declaration inside the
for-loop to make the scope of this variable more clear.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-09-16 23:58:46 +01:00
Christoph Hellwig
6894258eda dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}
Since 2009 we have a nice asm-generic header implementing lots of DMA API
functions for architectures using struct dma_map_ops, but unfortunately
it's still missing a lot of APIs that all architectures still have to
duplicate.

This series consolidates the remaining functions, although we still need
arch opt outs for two of them as a few architectures have very
non-standard implementations.

This patch (of 5):

The coherent DMA allocator works the same over all architectures supporting
dma_map operations.

This patch consolidates them and converges the minor differences:

 - the debug_dma helpers are now called from all architectures, including
   those that were previously missing them
 - dma_alloc_from_coherent and dma_release_from_coherent are now always
   called from the generic alloc/free routines instead of the ops
   dma-mapping-common.h always includes dma-coherent.h to get the defintions
   for them, or the stubs if the architecture doesn't support this feature
 - checks for ->alloc / ->free presence are removed.  There is only one
   magic instead of dma_map_ops without them (mic_dma_ops) and that one
   is x86 only anyway.

Besides that only x86 needs special treatment to replace a default devices
if none is passed and tweak the gfp_flags.  An optional arch hook is provided
for that.

[linux@roeck-us.net: fix build]
[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Linus Torvalds
c706c7eb0d Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM development updates from Russell King:
 "Included in this update:

   - moving PSCI code from ARM64/ARM to drivers/

   - removal of some architecture internals from global kernel view

   - addition of software based "privileged no access" support using the
     old domains register to turn off the ability for kernel
     loads/stores to access userspace.  Only the proper accessors will
     be usable.

   - addition of early fixup support for early console

   - re-addition (and reimplementation) of OMAP special interconnect
     barrier

   - removal of finish_arch_switch()

   - only expose cpuX/online in sysfs if hotpluggable

   - a number of code cleanups"

* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (41 commits)
  ARM: software-based priviledged-no-access support
  ARM: entry: provide uaccess assembly macro hooks
  ARM: entry: get rid of multiple macro definitions
  ARM: 8421/1: smp: Collapse arch_cpu_idle_dead() into cpu_die()
  ARM: uaccess: provide uaccess_save_and_enable() and uaccess_restore()
  ARM: mm: improve do_ldrd_abort macro
  ARM: entry: ensure that IRQs are enabled when calling syscall_trace_exit()
  ARM: entry: efficiency cleanups
  ARM: entry: get rid of asm_trace_hardirqs_on_cond
  ARM: uaccess: simplify user access assembly
  ARM: domains: remove DOMAIN_TABLE
  ARM: domains: keep vectors in separate domain
  ARM: domains: get rid of manager mode for user domain
  ARM: domains: move initial domain setting value to asm/domains.h
  ARM: domains: provide domain_mask()
  ARM: domains: switch to keeping domain value in register
  ARM: 8419/1: dma-mapping: harmonize definition of DMA_ERROR_CODE
  ARM: 8417/1: refactor bitops functions with BIT_MASK() and BIT_WORD()
  ARM: 8416/1: Feroceon: use of_iomap() to map register base
  ARM: 8415/1: early fixmap support for earlycon
  ...
2015-09-03 16:27:01 -07:00
Russell King
40d3f02851 Merge branches 'cleanup', 'fixes', 'misc', 'omap-barrier' and 'uaccess' into for-linus 2015-09-03 15:28:37 +01:00
Linus Torvalds
d975f309a8 Merge branch 'for-4.3/sg' of git://git.kernel.dk/linux-block
Pull SG updates from Jens Axboe:
 "This contains a set of scatter-gather related changes/fixes for 4.3:

   - Add support for limited chaining of sg tables even for
     architectures that do not set ARCH_HAS_SG_CHAIN.  From Christoph.

   - Add sg chain support to target_rd.  From Christoph.

   - Fixup open coded sg->page_link in crypto/omap-sham.  From
     Christoph.

   - Fixup open coded crypto ->page_link manipulation.  From Dan.

   - Also from Dan, automated fixup of manual sg_unmark_end()
     manipulations.

   - Also from Dan, automated fixup of open coded sg_phys()
     implementations.

   - From Robert Jarzmik, addition of an sg table splitting helper that
     drivers can use"

* 'for-4.3/sg' of git://git.kernel.dk/linux-block:
  lib: scatterlist: add sg splitting function
  scatterlist: use sg_phys()
  crypto/omap-sham: remove an open coded access to ->page_link
  scatterlist: remove open coded sg_unmark_end instances
  crypto: replace scatterwalk_sg_chain with sg_chain
  target/rd: always chain S/G list
  scatterlist: allow limited chaining without ARCH_HAS_SG_CHAIN
2015-09-02 13:22:38 -07:00
Dan Williams
db0fa0cb01 scatterlist: use sg_phys()
Coccinelle cleanup to replace open coded sg to physical address
translations.  This is in preparation for introducing scatterlists that
reference __pfn_t.

// sg_phys.cocci: convert usage page_to_phys(sg_page(sg)) to sg_phys(sg)
// usage: make coccicheck COCCI=sg_phys.cocci MODE=patch

virtual patch

@@
struct scatterlist *sg;
@@

- page_to_phys(sg_page(sg)) + sg->offset
+ sg_phys(sg)

@@
struct scatterlist *sg;
@@

- page_to_phys(sg_page(sg))
+ sg_phys(sg) & PAGE_MASK

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-17 08:13:26 -06:00
Lorenzo Nava
21caf3a765 ARM: 8398/1: arm DMA: Fix allocation from CMA for coherent DMA
This patch allows the use of CMA for DMA coherent memory allocation.
At the moment if the input parameter "is_coherent" is set to true
the allocation is not made using the CMA, which I think is not the
desired behaviour.
The patch covers the allocation and free of memory for coherent
DMA.

Signed-off-by: Lorenzo Nava <lorenx4@gmail.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-08-04 16:16:21 +01:00
Russell King
1234e3fda9 ARM: reduce visibility of dmac_* functions
The dmac_* functions are private to the ARM DMA API implementation, and
should not be used by drivers.  In order to discourage their use, remove
their prototypes and macros from asm/*.h.

We have to leave dmac_flush_range() behind as Exynos and MSM IOMMU code
use these; once these sites are fixed, this can be moved also.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-08-01 22:25:04 +01:00
Marek Szyprowski
462859aa7b ARM: 8404/1: dma-mapping: fix off-by-one error in bitmap size check
nr_bitmaps member of mapping structure stores the number of already
allocated bitmaps and it is interpreted as loop iterator (it starts from
0 not from 1), so a comparison against number of possible bitmap
extensions should include this fact. This patch fixes this by changing
the extension failure condition. This issue has been introduced by
commit 4d852ef8c2 ("arm: dma-mapping: Add
support to extend DMA IOMMU mappings").

Reported-by: Hyungwon Hwang <human.hwang@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Hyungwon Hwang <human.hwang@samsung.com>
Cc: stable@vger.kernel.org  # v3.15+
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-07-17 15:08:40 +01:00
Russell King
9de44aa4dc Merge branches 'arnd-fixes', 'clk', 'misc', 'v7' and 'fixes' into for-next 2015-06-12 21:18:08 +01:00
Mike Looijmans
55af8a9164 ARM: 8387/1: arm/mm/dma-mapping.c: Add arm_coherent_dma_mmap
When dma-coherent transfers are enabled, the mmap call must
not change the pg_prot flags in the vma struct.

Split the arm_dma_mmap into a common and specific parts,
and add a "arm_coherent_dma_mmap" implementation that does
not alter the page protection flags.

Tested on a topic-miami board (Zynq) using the ACP port
to transfer data between FPGA and CPU using the Dyplo
framework. Without this patch, byte-wise access to mmapped
coherent DMA memory was about 20x slower because of the
memory being marked as non-cacheable, and transfer speeds
would not exceed 240MB/s.

After this patch, the mapped memory is cacheable and the
transfer speed is again 600MB/s (limited by the FPGA) when
the data is in the L2 cache, while data integrity is being
maintained.

The patch has no effect on non-coherent DMA.

Signed-off-by: Mike Looijmans <mike.looijmans@topic.nl>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-06-06 10:44:04 +01:00
Marek Szyprowski
1424532b21 ARM: 8347/1: dma-mapping: fix off-by-one check in arm_setup_iommu_dma_ops
Patch 22b3c181c6 ("arm: dma-mapping: limit
IOMMU mapping size") added a check for IO address space size. However
this patch broke IOMMU initialization for typical platforms initialized
from device tree, which get the default IO address space size of 4GiB.
This value doesn't fit into size_t and fails a check introduced by that
commit resulting in failed dma-mapping/iommu initialization. This patch
fixes this issue by adding proper support for full 4GiB address space
size.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-05-03 23:21:55 +01:00
Linus Torvalds
bb0fd7ab09 Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:
 "Included in this update are both some long term fixes and some new
  features.

  Fixes:

   - An integer overflow in the calculation of ELF_ET_DYN_BASE.

   - Avoiding OOMs for high-order IOMMU allocations

   - SMP requires the data cache to be enabled for synchronisation
     primitives to work, so prevent the CPU_DCACHE_DISABLE option being
     visible on SMP builds.

   - A bug going back 10+ years in the noMMU ARM94* CPU support code,
     where it corrupts registers.  Found by folk getting Linux running
     on their cameras.

   - Versatile Express needs an errata workaround enabled for CPU
     hot-unplug to work.

  Features:

   - Clean up module linker by handling out of range relocations
     separately from relocation cases we don't handle.

   - Fix a long term bug in the pci_mmap_page_range() code, which we
     hope won't impact userspace (we hope there's no users of the
     existing broken interface.)

   - Don't map DMA coherent allocations when we don't have a MMU.

   - Drop experimental status for SMP_ON_UP.

   - Warn when DT doesn't specify ePAPR mandatory cache properties.

   - Add documentation concerning how we find the start of physical
     memory for AUTO_ZRELADDR kernels, detailing why we have chosen the
     mask and the implications of changing it.

   - Updates from Ard Biesheuvel to address some issues with large
     kernels (such as allyesconfig) failing to link.

   - Allow hibernation to work on modern (ARMv7) CPUs - this appears to
     have never worked in the past on these CPUs.

   - Enable IRQ_SHOW_LEVEL, which changes the /proc/interrupts output
     format (hopefully without userspace breaking...  let's hope that if
     it causes someone a problem, they tell us.)

   - Fix tegra-ahb DT offsets.

   - Rework ARM errata 643719 code (and ARMv7 flush_cache_louis()/
     flush_dcache_all()) code to be more efficient, and enable this
     errata workaround by default for ARMv7+SMP CPUs.  This complements
     the Versatile Express fix above.

   - Rework ARMv7 context code for errata 430973, so that only Cortex A8
     CPUs are impacted by the branch target buffer flush when this
     errata is enabled.  Also update the help text to indicate that all
     r1p* A8 CPUs are impacted.

   - Switch ARM to the generic show_mem() implementation, it conveys all
     the information which we were already reporting.

   - Prevent slow timer sources being used for udelay() - timers running
     at less than 1MHz are not useful for this, and can cause udelay()
     to return immediately, without any wait.  Using such a slow timer
     is silly.

   - VDSO support for 32-bit ARM, mainly for gettimeofday() using the
     ARM architected timer.

   - Perf support for Scorpion performance monitoring units"

vdso semantic conflict fixed up as per linux-next.

* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (52 commits)
  ARM: update errata 430973 documentation to cover Cortex A8 r1p*
  ARM: ensure delay timer has sufficient accuracy for delays
  ARM: switch to use the generic show_mem() implementation
  ARM: proc-v7: avoid errata 430973 workaround for non-Cortex A8 CPUs
  ARM: enable ARM errata 643719 workaround by default
  ARM: cache-v7: optimise test for Cortex A9 r0pX devices
  ARM: cache-v7: optimise branches in v7_flush_cache_louis
  ARM: cache-v7: consolidate initialisation of cache level index
  ARM: cache-v7: shift CLIDR to extract appropriate field before masking
  ARM: cache-v7: use movw/movt instructions
  ARM: allow 16-bit instructions in ALT_UP()
  ARM: proc-arm94*.S: fix setup function
  ARM: vexpress: fix CPU hotplug with CT9x4 tile.
  ARM: 8276/1: Make CPU_DCACHE_DISABLE depend on !SMP
  ARM: 8335/1: Documentation: DT bindings: Tegra AHB: document the legacy base address
  ARM: 8334/1: amba: tegra-ahb: detect and correct bogus base address
  ARM: 8333/1: amba: tegra-ahb: fix register offsets in the macros
  ARM: 8339/1: Enable CONFIG_GENERIC_IRQ_SHOW_LEVEL
  ARM: 8338/1: kexec: Relax SMP validation to improve DT compatibility
  ARM: 8337/1: mm: Do not invoke OOM for higher order IOMMU DMA allocations
  ...
2015-04-14 21:03:26 -07:00
Russell King
c848791f03 Merge branches 'misc', 'vdso' and 'fixes' into for-next
Conflicts:
	arch/arm/mm/proc-macros.S
2015-04-14 22:28:25 +01:00
Linus Torvalds
3be1b98e07 PCI changes for the v4.1 merge window:
Enumeration
     - Read capability list as dwords, not bytes (Sean O. Stalley)
 
   Resource management
     - Don't check for PNP overlaps with unassigned PCI BARs (Bjorn Helgaas)
     - Mark invalid BARs as unassigned (Bjorn Helgaas)
     - Show driver, BAR#, and resource on pci_ioremap_bar() failure (Bjorn Helgaas)
     - Fail pci_ioremap_bar() on unassigned resources (Bjorn Helgaas)
     - Assign resources before drivers claim devices (Yijing Wang)
     - Claim bus resources before pci_bus_add_devices() (Yijing Wang)
 
   Power management
     - Optimize device state transition delays (Aaron Lu)
     - Don't clear ASPM bits when the FADT declares it's unsupported (Matthew Garrett)
 
   Virtualization
     - Add ACS quirks for Intel 1G NICs (Alex Williamson)
 
   IOMMU
     - Add ptr to OF node arg to of_iommu_configure() (Murali Karicheri)
     - Move of_dma_configure() to device.c to help re-use (Murali Karicheri)
     - Fix size when dma-range is not used (Murali Karicheri)
     - Add helper functions pci_get[put]_host_bridge_device() (Murali Karicheri)
     - Add of_pci_dma_configure() to update DMA configuration (Murali Karicheri)
     - Update DMA configuration from DT (Murali Karicheri)
     - dma-mapping: limit IOMMU mapping size (Murali Karicheri)
     - Calculate device DMA masks based on DT dma-range size (Murali Karicheri)
 
   ARM Versatile host bridge driver
     - Check for devm_ioremap_resource() failures (Jisheng Zhang)
 
   Broadcom iProc host bridge driver
     - Add Broadcom iProc PCIe driver (Ray Jui)
 
   Marvell MVEBU host bridge driver
     - Add suspend/resume support (Thomas Petazzoni)
 
   Renesas R-Car host bridge driver
     - Fix position of MSI enable bit (Nobuhiro Iwamatsu)
     - Write zeroes to reserved PCIEPARL bits (Nobuhiro Iwamatsu)
     - Change PCIEPARL and PCIEPARH to PCIEPALR and PCIEPAUR (Nobuhiro Iwamatsu)
     - Verify that mem_res is 64K-aligned (Nobuhiro Iwamatsu)
 
   Samsung Exynos host bridge driver
     - Fix INTx enablement statement termination error (Jaehoon Chung)
 
   Miscellaneous
     - Make a shareable UUID for PCI firmware ACPI _DSM (Aaron Lu)
     - Clarify policy for vendor IDs in pci.txt (Michael S. Tsirkin)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVK/X+AAoJEFmIoMA60/r8hlkP/0e1GhAWA3DGR/+O2OPIkJ2w
 dQVgv5IN5KXGExT9RHiDL/Ib2PhDvdVI26sinjtkw/FcjyzoWVsPDUzCYudQaPSr
 zwzZto7dBzfv+exDN2LOqoSscCORAehApmTgYVg29cofJmKlO2ctDtpem1OT0MQ9
 CMRMoBHhRe4FF3VJPOBPDXYpS89TObrY600aMDGk4S2uBboZI3aeYiTNLXJyh6fX
 vRg3TWnTfQHoZINW/YOqao/WbrRixZbO6q4n2IqhI6i/uaAc1IEALk9im8/2ri/s
 mgb/K5Elq+j4yUGnbFRz62pj/YxwnQKwVO4Nc7P66zENgoOXtv+OGRhlS4+d00/n
 ux0+BkoxJdaL8HQ/b7+uPydiD85lbERM+B2+LQQ7JN+HI+UEcQ0PsK2hSQKb3njD
 uEkktlKZViiqALijpL+vKRFe8U4GRE4KUfVsKHhhPPvY5sQTAZ3DrR36e1zKz2pA
 YJjtaHYW0S/tfoEzi3EnPistbJw5sT0/Waj31QTKb/P0Fr7pHnJfcwV7+unXbKla
 Osz8m6ELIqxhnuzhjlbayh4MKn49n1ZlwkwCnBdjgLQy0KZtxsWZoBg8LeGU077c
 TJXukRfl3H8LvpqGMYaxOyw7yUeKobEWy+Ylo5asFnfFw9h6zvW+Sc97jtBCrm4/
 OZa7rKdPQGGMbQFMvDc2
 =vEVs
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI changes from Bjorn Helgaas:
 "Enumeration
    - Read capability list as dwords, not bytes (Sean O. Stalley)

  Resource management
    - Don't check for PNP overlaps with unassigned PCI BARs (Bjorn Helgaas)
    - Mark invalid BARs as unassigned (Bjorn Helgaas)
    - Show driver, BAR#, and resource on pci_ioremap_bar() failure (Bjorn Helgaas)
    - Fail pci_ioremap_bar() on unassigned resources (Bjorn Helgaas)
    - Assign resources before drivers claim devices (Yijing Wang)
    - Claim bus resources before pci_bus_add_devices() (Yijing Wang)

  Power management
    - Optimize device state transition delays (Aaron Lu)
    - Don't clear ASPM bits when the FADT declares it's unsupported (Matthew Garrett)

  Virtualization
    - Add ACS quirks for Intel 1G NICs (Alex Williamson)

  IOMMU
    - Add ptr to OF node arg to of_iommu_configure() (Murali Karicheri)
    - Move of_dma_configure() to device.c to help re-use (Murali Karicheri)
    - Fix size when dma-range is not used (Murali Karicheri)
    - Add helper functions pci_get[put]_host_bridge_device() (Murali Karicheri)
    - Add of_pci_dma_configure() to update DMA configuration (Murali Karicheri)
    - Update DMA configuration from DT (Murali Karicheri)
    - dma-mapping: limit IOMMU mapping size (Murali Karicheri)
    - Calculate device DMA masks based on DT dma-range size (Murali Karicheri)

  ARM Versatile host bridge driver
    - Check for devm_ioremap_resource() failures (Jisheng Zhang)

  Broadcom iProc host bridge driver
    - Add Broadcom iProc PCIe driver (Ray Jui)

  Marvell MVEBU host bridge driver
    - Add suspend/resume support (Thomas Petazzoni)

  Renesas R-Car host bridge driver
    - Fix position of MSI enable bit (Nobuhiro Iwamatsu)
    - Write zeroes to reserved PCIEPARL bits (Nobuhiro Iwamatsu)
    - Change PCIEPARL and PCIEPARH to PCIEPALR and PCIEPAUR (Nobuhiro Iwamatsu)
    - Verify that mem_res is 64K-aligned (Nobuhiro Iwamatsu)

  Samsung Exynos host bridge driver
    - Fix INTx enablement statement termination error (Jaehoon Chung)

  Miscellaneous
    - Make a shareable UUID for PCI firmware ACPI _DSM (Aaron Lu)
    - Clarify policy for vendor IDs in pci.txt (Michael S. Tsirkin)"

* tag 'pci-v4.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (36 commits)
  PCI: Read capability list as dwords, not bytes
  PCI: layerscape: Simplify platform_get_resource_byname() failure checking
  PCI: keystone: Don't dereference possible NULL pointer
  PCI: versatile: Check for devm_ioremap_resource() failures
  PCI: Don't clear ASPM bits when the FADT declares it's unsupported
  PCI: Clarify policy for vendor IDs in pci.txt
  PCI/ACPI: Optimize device state transition delays
  PCI: Export pci_find_host_bridge() for use inside PCI core
  PCI: Make a shareable UUID for PCI firmware ACPI _DSM
  PCI: Fix typo in Thunderbolt kernel message
  PCI: exynos: Fix INTx enablement statement termination error
  PCI: iproc: Add Broadcom iProc PCIe support
  PCI: iproc: Add DT docs for Broadcom iProc PCIe driver
  PCI: Export symbols required for loadable host driver modules
  PCI: Add ACS quirks for Intel 1G NICs
  PCI: mvebu: Add suspend/resume support
  PCI: Cleanup control flow
  sparc/PCI: Claim bus resources before pci_bus_add_devices()
  PCI: Assign resources before drivers claim devices (pci_scan_root_bus())
  PCI: Fail pci_ioremap_bar() on unassigned resources
  ...
2015-04-13 15:45:47 -07:00
Tomasz Figa
49f28aa6b0 ARM: 8337/1: mm: Do not invoke OOM for higher order IOMMU DMA allocations
IOMMU should be able to use single pages as well as bigger blocks, so if
higher order allocations fail, we should not affect state of the system,
with events such as OOM killer, but rather fall back to order 0
allocations.

This patch changes the behavior of ARM IOMMU DMA allocator to use
__GFP_NORETRY, which bypasses OOM invocation, for orders higher than
zero and, only if that fails, fall back to normal order 0 allocation
which might invoke OOM killer.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-04-02 09:58:25 +01:00
Will Deacon
89cfdb19a8 ARM: 8289/1: dma-mapping: use to_dma_iommu_mapping instead of accessing archdata
When using the IOMMU-backed DMA ops for a device, we store a pointer to
the dma_iommu_mapping structure (used to keep track of the address
space) in the archdata.mapping field of the struct device.

Rather than access this field directly, use the to_dma_iommu_mapping
helper in dma-mapping, so that we don't really care where the mapping
information is held.

Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-03-18 10:15:53 +00:00
Murali Karicheri
22b3c181c6 arm: dma-mapping: limit IOMMU mapping size
arm_iommu_create_mapping() has size parameter of size_t and
arm_setup_iommu_dma_ops() can take a value higher than that
when this is called from the OF code.  So limit the size to
SIZE_MAX.

Tested-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> (AMD Seattle)
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
CC: Joerg Roedel <joro@8bytes.org>
CC: Grant Likely <grant.likely@linaro.org>
CC: Rob Herring <robh+dt@kernel.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Arnd Bergmann <arnd@arndb.de>
2015-03-12 11:43:09 -05:00
Russell King
8bf1268f48 ARM: dma-api: fix off-by-one error in __dma_supported()
When validating the mask against the amount of memory we have available
(so that we can trap 32-bit DMA addresses with >32-bits memory), we had
not taken account of the fact that max_pfn is the maximum PFN number
plus one that would be in the system.

There are several references in the code which bear this out:

mm/page_owner.c:
	for (; pfn < max_pfn; pfn++) {
	}

arch/x86/kernel/setup.c:
	high_memory = (void *)__va(max_pfn * PAGE_SIZE - 1)

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-03-10 19:48:35 +00:00
Carlo Caione
6e8266e333 ARM: 8304/1: Respect NO_KERNEL_MAPPING when we don't have an IOMMU
Even without an iommu, NO_KERNEL_MAPPING is still convenient to save on
kernel address space in places where we don't need a kernel mapping.
Implement support for it in the two places where we're creating an
expensive mapping.

__alloc_from_pool uses an internal pool from which we already have
virtual addresses, so it's not relevant, and __alloc_simple_buffer uses
alloc_pages, which will always return a lowmem page, which is already
mapped into kernel space, so we can't prevent a mapping for it in that
case.

Signed-off-by: Jasper St. Pierre <jstpierre@mecheye.net>
Signed-off-by: Carlo Caione <carlo@caione.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Daniel Drake <dsd@endlessm.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-02-23 14:43:59 +00:00
Linus Torvalds
90c453ca22 Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fix from Russell King:
 "Just one fix this time around.  __iommu_alloc_buffer() can cause a
  BUG() if dma_alloc_coherent() is called with either __GFP_DMA32 or
  __GFP_HIGHMEM set.  The patch from Alexandre addresses this"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8305/1: DMA: Fix kzalloc flags in __iommu_alloc_buffer()
2015-02-22 09:57:16 -08:00
Alexandre Courbot
23be7fdafa ARM: 8305/1: DMA: Fix kzalloc flags in __iommu_alloc_buffer()
There doesn't seem to be any valid reason to allocate the pages array
with the same flags as the buffer itself. Doing so can eventually lead
to the following safeguard in mm/slab.c's cache_grow() to be hit:

        if (unlikely(flags & GFP_SLAB_BUG_MASK)) {
                pr_emerg("gfp: %un", flags & GFP_SLAB_BUG_MASK);
                BUG();
        }

This happens when buffers are allocated with __GFP_DMA32 or
__GFP_HIGHMEM.

Fix this by allocating the pages array with GFP_KERNEL to follow what is
done elsewhere in this file. Using GFP_KERNEL in __iommu_alloc_buffer()
is safe because atomic allocations are handled by __iommu_alloc_atomic().

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-02-20 11:14:42 +00:00
Linus Torvalds
5659c0e470 Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
 "A number of ARM fixes, the biggest is fixing a regression caused by
  appended DT blobs exceeding 64K, causing the decompressor fixup code
  to fail to patch the DT blob.  Another important fix is for the ASID
  allocator from Will Deacon which prevents some rare crashes seen on
  some systems.  Lastly, there's a build fix for v7M systems when printk
  support is disabled.

  The last two remaining fixes are more cosmetic - the IOMMU one
  prevents an annoying harmless warning message, and we disable the
  kernel strict memory permissions on non-MMU which can't support it
  anyway"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8299/1: mm: ensure local active ASID is marked as allocated on rollover
  ARM: 8298/1: ARM_KERNMEM_PERMS only works with MMU enabled
  ARM: 8295/1: fix v7M build for !CONFIG_PRINTK
  ARM: 8294/1: ATAG_DTB_COMPAT: remove the DT workspace's hardcoded 64KB size
  ARM: 8288/1: dma-mapping: don't detach devices without an IOMMU during teardown
2015-02-04 09:42:55 -08:00
Laurent Pinchart
eab8d6530c arm: dma-mapping: Set DMA IOMMU ops in arm_iommu_attach_device()
Commit 4bb25789ed ("arm: dma-mapping: plumb our iommu mapping ops
into arch_setup_dma_ops") moved the setting of the DMA operations from
arm_iommu_attach_device() to arch_setup_dma_ops() where the DMA
operations to be used are selected based on whether the device is
connected to an IOMMU. However, the IOMMU detection scheme requires the
IOMMU driver to be ported to the new IOMMU of_xlate API. As no driver
has been ported yet, this effectively breaks all IOMMU ARM users that
depend on the IOMMU being handled transparently by the DMA mapping API.

Fix this by restoring the setting of DMA IOMMU ops in
arm_iommu_attach_device() and splitting the rest of the function into a
new internal __arm_iommu_attach_device() function, called by
arch_setup_dma_ops().

Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Olof Johansson <olof@lixom.net>
2015-01-29 10:56:27 -08:00
Will Deacon
c2273a1853 ARM: 8288/1: dma-mapping: don't detach devices without an IOMMU during teardown
When tearing down the DMA ops for a device via of_dma_deconfigure, we
unconditionally detach the device from its IOMMU domain. For devices
that aren't actually behind an IOMMU, this produces a "Not attached"
warning message on the console.

This patch changes the teardown code so that we don't detach from the
IOMMU domain when there isn't an IOMMU dma mapping to start with.

Reported-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-01-29 15:22:44 +00:00
Linus Torvalds
6f51ee709e ARM: SoC/iommu configuration for 3.19
The iomm-config branch contains work from Will Deacon, quoting his description:
 
     This series adds automatic IOMMU and DMA-mapping configuration for
     OF-based DMA masters described using the generic IOMMU devicetree
     bindings. Although there is plenty of future work around splitting up
     iommu_ops, adding default IOMMU domains and sorting out automatic IOMMU
     group creation for the platform_bus, this is already useful enough for
     people to port over their IOMMU drivers and start using the new probing
     infrastructure (indeed, Marek has patches queued for the Exynos IOMMU).
 
 The branch touches core ARM and IOMMU driver files, and the respective
 maintainers (Russell King and Joerg Roedel) agreed to have the contents
 merged through the arm-soc tree. The final version was ready just before
 the merge window, so we ended up delaying it a bit longer than the rest,
 but we don't expect to see regressions because this is just additional
 infrastructure that will get used in drivers starting in 3.20 but is
 unused so far.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAVJCfoGCrR//JCVInAQIfvxAAhVeEKyhroIGiuCmylWK/TdXja+xO46g+
 hkrijO0cPB5C7K45AW2a2aCUM0jSjr81dUprQ/uojr3xXxnJ59t7tDAXpKpFy8xi
 5gb/wd/Cea90RtR1mUnNr/+P1sJKemcvmhCuib7111E5wd/s617bLd1+zgCuHguj
 g733GjDE7SUSTEStviDg963pn+l2IartjhRPhAKmGWiLZA7RiWe35pzDTZGCApnd
 yfZafXxn4IeUcxQUT6lAsW7xShzCUI2CZ8nZ4tG6YcyR2UNB5BVrPb1BAm6Eb28C
 1WmyjnAAyXxc6pqPTalO+JctpS7ujjbtwlOOwgthKyKMfpFnqyavablDl6GvtHn8
 NIa3HdnKQTXl9/nRXCvIjeWDyaZEZ5ueacfhMm4PWRSIkqKFVgwY18nNkOul9fuz
 0UD9EuN0PPHV2hCIp9Kl3Jju5pi2EEzCt/Vn0YGsZTZuVOfREZ3izDtyKFg1tjif
 AJ5kFRc1X+6hXNDUWUOmLOnjBvupbq2axFbLeAzQxla/O/0pwHWhiuqXu3uB4six
 1Hlgt7yI7pob86VcQKTCg1v8kOvQTEuL2BtUWkCpbyrVSafYRVKwlUNnQlmu5F3c
 sL14hhK9QSHyCmJ7yKchY104QVKmN8v3ks8PyUNoPxq57ChH4E6FVAZpMz08uF5V
 mIWREpeIPNw=
 =ELLq
 -----END PGP SIGNATURE-----

Merge tag 'iommu-config-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC/iommu configuration update from Arnd Bergmann:
 "The iomm-config branch contains work from Will Deacon, quoting his
  description:

    This series adds automatic IOMMU and DMA-mapping configuration for
    OF-based DMA masters described using the generic IOMMU devicetree
    bindings. Although there is plenty of future work around splitting up
    iommu_ops, adding default IOMMU domains and sorting out automatic IOMMU
    group creation for the platform_bus, this is already useful enough for
    people to port over their IOMMU drivers and start using the new probing
    infrastructure (indeed, Marek has patches queued for the Exynos IOMMU).

  The branch touches core ARM and IOMMU driver files, and the respective
  maintainers (Russell King and Joerg Roedel) agreed to have the
  contents merged through the arm-soc tree.

  The final version was ready just before the merge window, so we ended
  up delaying it a bit longer than the rest, but we don't expect to see
  regressions because this is just additional infrastructure that will
  get used in drivers starting in 3.20 but is unused so far"

* tag 'iommu-config-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  iommu: store DT-probed IOMMU data privately
  arm: dma-mapping: plumb our iommu mapping ops into arch_setup_dma_ops
  arm: call iommu_init before of_platform_populate
  dma-mapping: detect and configure IOMMU in of_dma_configure
  iommu: fix initialization without 'add_device' callback
  iommu: provide helper function to configure an IOMMU for an of master
  iommu: add new iommu_ops callback for adding an OF device
  dma-mapping: replace set_arch_dma_coherent_ops with arch_setup_dma_ops
  iommu: provide early initialisation hook for IOMMU drivers
2014-12-16 14:53:01 -08:00
Will Deacon
4bb25789ed arm: dma-mapping: plumb our iommu mapping ops into arch_setup_dma_ops
This patch plumbs the existing ARM IOMMU DMA infrastructure (which isn't
actually called outside of a few drivers) into arch_setup_dma_ops, so
that we can use IOMMUs for DMA transfers in a more generic fashion.

Since this significantly complicates the arch_setup_dma_ops function,
it is moved out of line into dma-mapping.c. If CONFIG_ARM_DMA_USE_IOMMU
is not set, the iommu parameter is ignored and the normal ops are used
instead.

Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2014-12-01 16:51:35 +00:00
Laura Abbott
005757298f ARM: 8181/1: Drop extra return statement
Commit 513510ddba
(common: dma-mapping: introduce common remapping functions)
managed to end up with an extra return statement from the
original patch. Drop it.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-10-29 17:20:51 +00:00
Laura Abbott
36d0fd2198 arm: use genalloc for the atomic pool
ARM currently uses a bitmap for tracking atomic allocations.  genalloc
already handles this type of memory pool allocation so switch to using
that instead.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Riley <davidriley@chromium.org>
Cc: Olof Johansson <olof@lixom.net>
Cc: Ritesh Harjain <ritesh.harjani@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:25:52 -04:00
Laura Abbott
513510ddba common: dma-mapping: introduce common remapping functions
For architectures without coherent DMA, memory for DMA may need to be
remapped with coherent attributes.  Factor out the the remapping code from
arm and put it in a common location to reduce code duplication.

As part of this, the arm APIs are now migrated away from
ioremap_page_range to the common APIs which use map_vm_area for remapping.
 This should be an equivalent change and using map_vm_area is more correct
as ioremap_page_range is intended to bring in io addresses into the cpu
space and not regular kernel managed memory.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Riley <davidriley@chromium.org>
Cc: Olof Johansson <olof@lixom.net>
Cc: Ritesh Harjain <ritesh.harjani@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Mitchel Humpherys <mitchelh@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:25:52 -04:00
Joonsoo Kim
a254129e86 CMA: generalize CMA reserved area management functionality
Currently, there are two users on CMA functionality, one is the DMA
subsystem and the other is the KVM on powerpc.  They have their own code
to manage CMA reserved area even if they looks really similar.  From my
guess, it is caused by some needs on bitmap management.  KVM side wants
to maintain bitmap not for 1 page, but for more size.  Eventually it use
bitmap where one bit represents 64 pages.

When I implement CMA related patches, I should change those two places
to apply my change and it seem to be painful to me.  I want to change
this situation and reduce future code management overhead through this
patch.

This change could also help developer who want to use CMA in their new
feature development, since they can use CMA easily without copying &
pasting this reserved area management code.

In previous patches, we have prepared some features to generalize CMA
reserved area management and now it's time to do it.  This patch moves
core functions to mm/cma.c and change DMA APIs to use these functions.

There is no functional change in DMA APIs.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Gleb Natapov <gleb@kernel.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06 18:01:16 -07:00
Russell King
6b076991dc ARM: DMA: ensure that old section mappings are flushed from the TLB
When setting up the CMA region, we must ensure that the old section
mappings are flushed from the TLB before replacing them with page
tables, otherwise we can suffer from mismatched aliases if the CPU
speculatively prefetches from these mappings at an inopportune time.

A mismatched alias can occur when the TLB contains a section mapping,
but a subsequent prefetch causes it to load a page table mapping,
resulting in the possibility of the TLB containing two matching
mappings for the same virtual address region.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-07-17 19:26:08 +01:00
Linus Torvalds
eb3d3ec567 Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm into next
Pull ARM updates from Russell King:

 - Major clean-up of the L2 cache support code.  The existing mess was
   becoming rather unmaintainable through all the additions that others
   have done over time.  This turns it into a much nicer structure, and
   implements a few performance improvements as well.

 - Clean up some of the CP15 control register tweaks for alignment
   support, moving some code and data into alignment.c

 - DMA properties for ARM, from Santosh and reviewed by DT people.  This
   adds DT properties to specify bus translations we can't discover
   automatically, and to indicate whether devices are coherent.

 - Hibernation support for ARM

 - Make ftrace work with read-only text in modules

 - add suspend support for PJ4B CPUs

 - rework interrupt masking for undefined instruction handling, which
   allows us to enable interrupts earlier in the handling of these
   exceptions.

 - support for big endian page tables

 - fix stacktrace support to exclude stacktrace functions from the
   trace, and add save_stack_trace_regs() implementation so that kprobes
   can record stack traces.

 - Add support for the Cortex-A17 CPU.

 - Remove last vestiges of ARM710 support.

 - Removal of ARM "meminfo" structure, finally converting us solely to
   memblock to handle the early memory initialisation.

* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (142 commits)
  ARM: ensure C page table setup code follows assembly code (part II)
  ARM: ensure C page table setup code follows assembly code
  ARM: consolidate last remaining open-coded alignment trap enable
  ARM: remove global cr_no_alignment
  ARM: remove CPU_CP15 conditional from alignment.c
  ARM: remove unused adjust_cr() function
  ARM: move "noalign" command line option to alignment.c
  ARM: provide common method to clear bits in CPU control register
  ARM: 8025/1: Get rid of meminfo
  ARM: 8060/1: mm: allow sub-architectures to override PCI I/O memory type
  ARM: 8066/1: correction for ARM patch 8031/2
  ARM: 8049/1: ftrace/add save_stack_trace_regs() implementation
  ARM: 8065/1: remove last use of CONFIG_CPU_ARM710
  ARM: 8062/1: Modify ldrt fixup handler to re-execute the userspace instruction
  ARM: 8047/1: rwsem: use asm-generic rwsem implementation
  ARM: l2c: trial at enabling some Cortex-A9 optimisations
  ARM: l2c: add warnings for stuff modifying aux_ctrl register values
  ARM: l2c: print a warning with L2C-310 caches if the cache size is modified
  ARM: l2c: remove old .set_debug method
  ARM: l2c: kill L2X0_AUX_CTRL_MASK before anyone else makes use of this
  ...
2014-06-05 15:57:04 -07:00
Russell King
bd63ce27d9 Merge branch 'devel-stable' into for-next 2014-06-05 12:36:22 +01:00
Russell King
1fb333489f Merge branches 'alignment', 'fixes', 'l2c' (early part) and 'misc' into for-next 2014-06-05 12:35:52 +01:00
Russell King
6b74f61a47 DT support for 'dma-ranges'and 'dma-coherent' properties with ARM updates
- The 'dma-ranges' helps to take care of few DMAable system memory
         restrictions by use of dma_pfn_offset which is maintained per
         device. Arch code then uses it for dma address translations for such
         cases. We update the dma_pfn_offset accordingly during DT the device
         creation process.
 - The 'dma-coherent' property is used to setup arch's coherent dma_ops.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTajacAAoJEHJsHOdBp5c/780QAJN50zmxyZ7sqA9xGum8MSJl
 Vjpp1mw3eu7dZ1HoWcpn35l0tOEVpU/wo4ymtt6YYUhD3Po2LZCl3e43h91B/9/B
 Ih++WZaN+UmpUpp9YJyeS9pkl0wwEqSmJyTBXZrhFhl4o3KNQlHWPGOMJ5CBPaA0
 Z03TT1MeOMiCo10xz6JCA/DjPnQz9m5ClxNXLwdP1KOiTDDsv4gtkTZ0UenttIoU
 DTerJ+GIt1Gzb+P92aGvuc9wgLKacYmH599m6fQcmd9cIG2oMN2Xdxzfqo56v7Sb
 TGwFcKWYlhPDbDPmcPlidS6j4O+r8cMRwgHLO3r6LHJezCGQOYU8GzN7m6DKt4ww
 lCIR/k9u4YY/ZiLFeQ+G0Au8T1J6DHdbCI5sciFI53XYT4HMsV1aNpogOim7adC8
 4bPRmGCIN03aW+2ynLkFkdnXSBnaAyjt6qlr5zP8owsKDkV7+0WadQqyD2ovQ0FE
 sBt1HtOUGUsiR/97J4JFBGFxb84zMa6hXhFVUeFbyScCJNm2gkKeRQfiiB4mZi9L
 NAX/KVGyS6dktJaoLUiKi/p7aqOat3ezD1PrCziq4ceyWbDLag8Bq9H7rtb7vvqC
 ulHDUPfRy3Z9kmV8+QAznqPJVY1IHXJ18A+YFXF5ktr+5CJ51C8HjVZP3GZKncPC
 LpA1rRUEwEqsAwnjzcXW
 =Q7n3
 -----END PGP SIGNATURE-----

Merge tag 'dt-dma-properties-for-arm' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone into devel-stable

DT support for 'dma-ranges'and 'dma-coherent' properties with ARM updates

- The 'dma-ranges' helps to take care of few DMAable system memory
        restrictions by use of dma_pfn_offset which is maintained per
        device. Arch code then uses it for dma address translations for such
        cases. We update the dma_pfn_offset accordingly during DT the device
        creation process.
- The 'dma-coherent' property is used to setup arch's coherent dma_ops.
2014-05-23 12:30:52 +01:00
Russell King
deace4a6b4 ARM: dma-mapping: avoid calling dma_cache_maint_page() on dev=>cpu
Avoid calling dma_cache_maint_page() when unmapping a DMA_TO_DEVICE
buffer.  The L1 cache ops never do anything in this circumstance, nor
do they ever need to - all that matters for this case is that the data
written is visible to the device before DMA starts.  What happens during
the transfer (provided the buffer is not written to) is of no real
consequence.

We already do this optimisation for the L2 cache.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-05-22 16:33:14 +01:00
Gioh Kim
e464ef16c4 arm: dma-mapping: add checking cma area initialized
If CMA is turned on and CMA size is set to zero, kernel should
behave as if CMA was not enabled at compile time.
Every dma allocation should check existence of cma area
before requesting memory.

Signed-off-by: Gioh Kim <gioh.kim@lge.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
[mszyprow: removed redundant empty line from the patch]
Signed-off-by: <m.szyprowski@samsung.com>
2014-05-22 08:09:31 +02:00
Ritesh Harjani
006f841db1 arm: dma-iommu: Clean up redundant variable
mapping->size can be derived from mapping->bits << PAGE_SHIFT
which makes mapping->size as redundant.

Clean this up.

Signed-off-by: Ritesh Harjani <ritesh.harjani@gmail.com>
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2014-05-20 13:43:26 +02:00
Santosh Shilimkar
2161c2485d ARM: dma: use phys_addr_t in __dma_page_[cpu_to_dev/dev_to_cpu]
On a 32 bit ARM architecture with LPAE extension physical addresses
cannot fit into unsigned long variable.

So fix it by using phys_addr_t instead of unsigned long.

Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
2014-05-07 09:21:45 -04:00
Ritesh Harjani
59f0f119e8 arm: dma-mapping: Fix mapping size value
68efd7d2fb("arm: dma-mapping: remove order parameter from
arm_iommu_create_mapping()") is causing kernel panic
because it wrongly sets the value of mapping->size:

Unable to handle kernel NULL pointer dereference at virtual
address 000000a0
pgd = e7a84000
[000000a0] *pgd=00000000
...
PC is at bitmap_clear+0x48/0xd0
LR is at __iommu_remove_mapping+0x130/0x164

Fix it by correcting mapping->size value.

Signed-off-by: Ritesh Harjani <ritesh.harjani@gmail.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2014-04-23 15:07:00 +02:00