ARMv6 and greater introduced a new instruction ("bx") which can be used
to return from function calls. Recent CPUs perform better when the
"bx lr" instruction is used rather than the "mov pc, lr" instruction,
and this sequence is strongly recommended to be used by the ARM
architecture manual (section A.4.1.1).
We provide a new macro "ret" with all its variants for the condition
code which will resolve to the appropriate instruction.
Rather than doing this piecemeal, and miss some instances, change all
the "mov pc" instances to use the new macro, with the exception of
the "movs" instruction and the kprobes code. This allows us to detect
the "mov pc, lr" case and fix it up - and also gives us the possibility
of deploying this for other registers depending on the CPU selection.
Reported-by: Will Deacon <will.deacon@arm.com>
Tested-by: Stephen Warren <swarren@nvidia.com> # Tegra Jetson TK1
Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> # mioa701_bootresume.S
Tested-by: Andrew Lunn <andrew@lunn.ch> # Kirkwood
Tested-by: Shawn Guo <shawn.guo@freescale.com>
Tested-by: Tony Lindgren <tony@atomide.com> # OMAPs
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com> # Armada XP, 375, 385
Acked-by: Sekhar Nori <nsekhar@ti.com> # DaVinci
Acked-by: Christoffer Dall <christoffer.dall@linaro.org> # kvm/hyp
Acked-by: Haojian Zhuang <haojian.zhuang@gmail.com> # PXA3xx
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> # Xen
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> # ARMv7M
Tested-by: Simon Horman <horms+renesas@verge.net.au> # Shmobile
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.
After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.
Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
and are flagged as __cpuinit -- so if we remove the __cpuinit from
the arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
related content into no-ops as early as possible, since that will get
rid of these warnings. In any case, they are temporary and harmless.
This removes all the ARM uses of the __cpuinit macros from C code,
and all __CPUINIT from assembly code. It also had two ".previous"
section statements that were paired off against __CPUINIT
(aka .section ".cpuinit.text") that also get removed here.
[1] https://lkml.org/lkml/2013/5/20/589
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Let's do the changes properly and fix the same problem everywhere, not
just for one case.
Cc: <stable@vger.kernel.org> # kernels containing 15e0d9e37c or equivalent
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
ARM v7 architecture introduced the concept of cache levels and related
control registers. New processors like A7 and A15 embed an L2 unified cache
controller that becomes part of the cache level hierarchy. Some operations in
the kernel like cpu_suspend and __cpu_disable do not require a flush of the
entire cache hierarchy to DRAM but just the cache levels belonging to the
Level of Unification Inner Shareable (LoUIS), which in most of ARM v7 systems
correspond to L1.
The current cache flushing API used in cpu_suspend and __cpu_disable,
flush_cache_all(), ends up flushing the whole cache hierarchy since for
v7 it cleans and invalidates all cache levels up to Level of Coherency
(LoC) which cripples system performance when used in hot paths like hotplug
and cpuidle.
Therefore a new kernel cache maintenance API must be added to cope with
latest ARM system requirements.
This patch adds flush_cache_louis() to the ARM kernel cache maintenance API.
This function cleans and invalidates all data cache levels up to the
Level of Unification Inner Shareable (LoUIS) and invalidates the instruction
cache for processors that support it (> v7).
This patch also creates an alias of the cache LoUIS function to flush_kern_all
for all processor versions prior to v7, so that the current cache flushing
behaviour is unchanged for those processors.
v7 cache maintenance code implements a cache LoUIS function that cleans and
invalidates the D-cache up to LoUIS and invalidates the I-cache, according
to the new API.
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
The CPU reset functions disable the MMU and therefore must be executed
with an identity mapping in place.
This patch places the CPU reset functions into the .idmap.text section,
causing the idmap code to include them as part of the identity mapping.
Acked-by: Dave Martin <dave.martin@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Only use the preallocated page table during the resume, not while
suspending. This avoids the overhead of having to switch unnecessarily
to the resume page table in the suspend path.
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Preallocate a page table and setup an identity mapping for the MMU
enable code. This means we don't have to "borrow" a page table to
do this, avoiding complexities with L2 cache coherency.
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
r1 stores the v:p offset from the CPU invariant resume code, and is
expected to be preserved by the CPU specific code. Overwriting it is
not a good idea.
We've managed to get away with it on sa1100 platforms because most
happen to have PHYS_OFFSET == PAGE_OFFSET, but that may not be the
case depending on kernel configuration. So fix this latent bug.
This fixes xsc3 as well which was saving and restoring this register
independently.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* 'next/cross-platform' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc:
ARM: Consolidate the clkdev header files
ARM: set vga memory base at run-time
ARM: convert PCI defines to variables
ARM: pci: make pcibios_assign_all_busses use pci_has_flag
ARM: remove unnecessary mach/hardware.h includes
pci: move microblaze and powerpc pci flag functions into asm-generic
powerpc: rename ppc_pci_*_flags to pci_*_flags
Fix up conflicts in arch/microblaze/include/asm/pci-bridge.h
Commit 66a625a (ARM: mm: proc-macros: Add generic proc/cache/tlb struct
definition macros) introduced build errors when PM_SLEEP is not enabled.
The per-CPU do_suspend/do_resume functions are defined via the
preprocessor to constant 0. However, the macros which use these were
converted to assembly, resulting in undefined references to these
functions. Fix that by moving the ! ifdef section into proc-macros.S
and deleting it from all effected proc-*.S files.
Acked-by: Dave Martin <dave.martin@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Remove some includes of mach/hardware.h which are not needed. hardware.h
will be removed completely for tegra and cns3xxx in follow on patch.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
CONFIG_PM is now set whenever we support either runtime PM in addition
to suspend and hibernate. This causes build errors when runtime PM is
enabled on a platform, but the CPU does not have the appropriate support
for suspend.
So, switch this code to use CONFIG_PM_SLEEP rather than CONFIG_PM to
allow runtime PM to be enabled without causing build errors.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This adds core support for saving and restoring CPU coprocessor
registers for suspend/resume support. This contains support for suspend
with ARM920, ARM926, SA11x0, PXA25x, PXA27x, PXA3xx, V6 and V7 CPUs.
Tested on Assabet and Tegra 2.
Tested-by: Colin Cross <ccross@android.com>
Tested-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Commit 81d11955bf ("ARM: 6405/1: Handle __flush_icache_all for
CONFIG_SMP_ON_UP") added a new function to struct cpu_cache_fns:
flush_icache_all(). It also implemented this for v6 and v7 but not
for v5 and backwards. Without the function pointer in place, we
will be calling wrong cache functions.
For example with ep93xx we get following:
Unable to handle kernel paging request at virtual address ee070f38
pgd = c0004000
[ee070f38] *pgd=00000000
Internal error: Oops: 80000005 [#1] PREEMPT
last sysfs file:
Modules linked in:
CPU: 0 Not tainted (2.6.36+ #1)
PC is at 0xee070f38
LR is at __dma_alloc+0x11c/0x2d0
pc : [<ee070f38>] lr : [<c0032c8c>] psr: 60000013
sp : c581bde0 ip : 00000000 fp : c0472000
r10: c0472000 r9 : 000000d0 r8 : 00020000
r7 : 0001ffff r6 : 00000000 r5 : c0472400 r4 : c5980000
r3 : c03ab7e0 r2 : 00000000 r1 : c59a0000 r0 : c5980000
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
Control: c000717f Table: c0004000 DAC: 00000017
Process swapper (pid: 1, stack limit = 0xc581a270)
[<c0032c8c>] (__dma_alloc+0x11c/0x2d0)
[<c0032e5c>] (dma_alloc_writecombine+0x1c/0x24)
[<c0204148>] (ep93xx_pcm_preallocate_dma_buffer+0x44/0x60)
[<c02041c0>] (ep93xx_pcm_new+0x5c/0x88)
[<c01ff188>] (snd_soc_instantiate_cards+0x8a8/0xbc0)
[<c01ff59c>] (soc_probe+0xfc/0x134)
[<c01adafc>] (platform_drv_probe+0x18/0x1c)
[<c01acca4>] (driver_probe_device+0xb0/0x16c)
[<c01ac284>] (bus_for_each_drv+0x48/0x84)
[<c01ace90>] (device_attach+0x50/0x68)
[<c01ac0f8>] (bus_probe_device+0x24/0x44)
[<c01aad7c>] (device_add+0x2fc/0x44c)
[<c01adfa8>] (platform_device_add+0x104/0x15c)
[<c0015eb8>] (simone_init+0x60/0x94)
[<c0021410>] (do_one_initcall+0xd0/0x1a4)
__dma_alloc() calls (inlined) __dma_alloc_buffer() which ends up
calling dmac_flush_range(). Now since the entries in the
arm920_cache_fns are shifted by one, we jump into address 0xee070f38
which is actually next instruction after the arm920_cache_fns
structure.
So implement flush_icache_all() for the rest of the supported CPUs
using a generic 'invalidate I cache' instruction.
Signed-off-by: Mika Westerberg <mika.westerberg@iki.fi>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When hotplug CPU is enabled, we need to keep the list of supported CPUs,
their setup functions, and __lookup_processor_type in place so that we
can find and initialize secondary CPUs. Move these into the __CPUINIT
section.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
All implementations of cpu_proc_fin() start by disabling interrupts
and then flush caches. Rather than have every processors proc_fin()
implementation do this, move it out into generic code - and move the
cache flush past setup_mm_for_reboot() (so it can benefit from having
caches still enabled.)
This allows cpu_proc_fin() to become independent of the L1/L2 cache
types, and eventually move the L2 cache flushing into the L2 support
code.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
These are now unused, and so can be removed.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tested-By: Santosh Shilimkar <santosh.shilimkar@ti.com>
Check whether L2 is present or not in XSC3. If it's present, enable L2
immediately.
Disabling L2 after L2 is enabled that would result in unpredicatable behavior
of XSC3 processor.
Signed-off-by: Haojian Zhuang <haojian.zhuang@marvell.com>
Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
This adds a better sched_clock() to the IOP platform,
implemented using its new clocksource support.
Tested on n2100, compile-tested for all plat-iop machines.
[dan.j.williams@intel.com: allow early cp6 access]
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Instruction fault status register, IFSR, was introduced on ARMv6 to
provide status information about the last insturction fault. It
needed for proper prefetch abort handling.
Now we have three prefetch abort model:
* legacy - for CPUs before ARMv6. They doesn't provide neither
IFSR nor IFAR. We simulate IFSR with section translation fault
status for them to generalize code;
* ARMv6 - provides IFSR, but not IFAR;
* ARMv7 - provides both IFSR and IFAR.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
PXA935 has changed its implementor ID from Intel to Marvell, this
patch modifies arch/arm/boot/compressed/head.S and proc-xsc3.S to
support a smooth bootup.
Signed-off-by: Eric Miao <eric.miao@marvell.com>
Commit 40df2d1d "[ARM] Update Xscale and Xscale3 PTE mappings" was
fingered by git-bisect for a boot failure on iop13xx. The change made
L_PTE_MT_WRITETHROUGH mappings L2-uncacheable. Russell points out that
this mapping is used for the vector page. Given the regression, and the
fact this page is used often, restore the old behaviour.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
As of the previous commit, MT_DEVICE_IXP2000 encodes to the same
PTE bit encoding as MT_DEVICE, so it's now redundant. Convert
MT_DEVICE_IXP2000 to use MT_DEVICE instead, and remove its aliases.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
There are actually only four separate implementations of set_pte_ext.
Use assembler macros to insert code for these into the proc-*.S files.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Remove includes of asm/hardware.h in addition to asm/arch/hardware.h.
Then, since asm/hardware.h only exists to include asm/arch/hardware.h,
update everything to directly include asm/arch/hardware.h and remove
asm/hardware.h.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
(20072fd0c9 lost most of its changes
somehow, came from a mbox archive applied with git-am. No idea
what happened. This puts back the missing bits. --rmk)
The initial patch from Lothar, and Lennert make it into a cleaner
one, modified and tested on PXA320 by Eric Miao.
This patch moves the L2 cache operations out of proc-xsc3.S into
dedicated outer cache support code.
CACHE_XSC3L2 can be deselected so no L2 cache specific code will be
linked in, and that L2 enable bit will not be set, this applies to
the following cases:
a. _only_ PXA300/PXA310 support included and no L2 cache wanted
b. PXA320 support included, but want L2 be disabled
So the enabling of L2 depends on two things:
- CACHE_XSC3L2 is selected
- and L2 cache is present
Where the latter is only a safeguard (previous testing shows it works
OK even when this bit is turned on).
IXP series of processors with XScale3 cannot disable L2 cache for the
moment since they depend on the L2 cache for its coherent memory, so
IXP may always select CACHE_XSC3L2.
Other L2 relevant bits are always turned on (i.e. the original code
enclosed by #if L2_CACHE_ENABLED .. #endif), as they showed no side
effects. Specifically, these bits are:
- OC bits in TTBASE register (table walk outer cache attributes)
- LLR Outer Cache Attributes (OC) in Auxiliary Control Register
Signed-off-by: Lothar WaÃ<9f>mann <LW@KARO-electronics.de>
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Signed-off-by: Eric Miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The proc-*.S files have the _prefetch_abort pointer placed at the end
of the processor structure but the cpu-multi32.h defines it in the
second position. The patch also fixes the support for XSC3 and the
MMU-less CPUs (740, 7tdmi, 940, 946 and 9tdmi).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch cleans up proc-xsc3:
- Correct a number of typos.
- Fix up indentation in a number of places.
- Change references to the various caches to be more clear about
whether we're talking about the L1 D, the L1 I or the unified L2
cache.
- Rename "drain write buffer" to "data write barrier", the official
name used in the Manzano manual.
- Change the xsc3 cpu name from "XScale-Core3" to "XScale-V3 based
processor".
Also, since a previously merged patch implements proper support for
using a MAC or iWMMXt coprocessor on xsc3 platforms, we no longer
need to enable access to CP0 on boot.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Deepak Saxena has agreed to hand xsc3 maintainership over to me.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
L_PTE_ASID is not really required to be stored in every PTE, since we
can identify it via the address passed to set_pte_at(). So, create
set_pte_ext() which takes the address of the PTE to set, the Linux
PTE value, and the additional CPU PTE bits which aren't encoded in
the Linux PTE value.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Remove BTB_ENABLE from proc-xsc3.S
On some early revisions of xsc3 enabling the branch target buffer can cause
crashes, see erratum #42.
Cc: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Merge L_PTE_COHERENT with L_PTE_SHARED and free up a L_PTE_* bit.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
These files want to provide/access ELF hwcap information, so should
be including asm/elf.h rather than asm/procinfo.h
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
On some CPUs, bit 4 of section mappings means "update the
cache when written to". On others, this bit is required to
be one, and others it's required to be zero. Finally, on
ARMv6 and above, setting it turns on "no execute" and prevents
speculative prefetches.
With all these combinations, no one value fits all CPUs, so we
have to pick a value depending on the CPU type, and the area
we're mapping.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Most MMU-based CPUs have a restriction on the setting of the data cache
enable and mmu enable bits in the control register, whereby if the data
cache is enabled, the MMU must also be enabled. Enabling the data
cache without the MMU is an invalid combination.
However, there are CPUs where the data cache can be enabled without the
MMU.
In order to allow these CPUs to take advantage of that, provide a
method whereby each proc-*.S file defines the control regsiter value
for use with nommu (with the MMU disabled.) Later on, when we add
support for enabling the MMU on these devices, we can adjust the
"crval" macro to also enable the data cache for nommu.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
We don't enable the BTB on the ixp2350 as that can cause weird
crashes (erratum #42.) However, some bootloaders enable the BTB,
which means that we have to disable the BTB explicitly.
Found thanks to Tom Rini.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Deepak Saxena <dsaxena@plexity.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch from Lennert Buytenhek
This patch adds support for the I/O coherent cache available on the
xsc3. The approach is to provide a simple API to determine whether the
chipset supports coherency by calling arch_is_coherent() and then
setting the appropriate system memory PTE and PMD bits. In addition,
we call this API on dma_alloc_coherent() and dma_map_single() calls.
A generic version exists that will compile out all the coherency-related
code that is not needed on the majority of ARM systems.
Note that we do not check for coherency in the dma_alloc_writecombine()
function as that still requires a special PTE setting. We also don't
touch dma_mmap_coherent() as that is a special ARM-only API that is by
definition only used on non-coherent system.
Signed-off-by: Deepak Saxena <dsaxena@plexity.net>
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Lennert Buytenhek
Adapt xsc3 to the changes in 74945c8616
(xsc3 was written before but merged after the latter went in.)
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Patch from Lennert Buytenhek
This patch adds support for the new XScale v3 core. This is an
ARMv5 ISA core with the following additions:
- L2 cache
- I/O coherency support (on select chipsets)
- Low-Locality Reference cache attributes (replaces mini-cache)
- Supersections (v6 compatible)
- 36-bit addressing (v6 compatible)
- Single instruction cache line clean/invalidate
- LRU cache replacement (vs round-robin)
I attempted to merge the XSC3 support into proc-xscale.S, but XSC3
cores have separate errata and have to handle things like L2, so it
is simpler to keep it separate.
L2 cache support is currently a build option because the L2 enable
bit must be set before we enable the MMU and there is no easy way to
capture command line parameters at this point.
There are still optimizations that can be done such as using LLR for
copypage (in theory using the exisiting mini-cache code) but those
can be addressed down the road.
Signed-off-by: Deepak Saxena <dsaxena@plexity.net>
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>