linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
Tom Musta	17e8de7e18	powerpc: Unaligned stores and stmw are broken in emulation code The stmw instruction was incorrectly decoded as an update form instruction and thus the RA register was being clobbered. Also, the utility routine to write memory to unaligned addresses breaks the operation into smaller aligned accesses but was incorrectly incrementing the address by only one; it needs to increment the address by the size of the smaller aligned chunk. Signed-off-by: Tom Musta <tmusta@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-08-27 14:36:08 +10:00
Anton Blanchard	7ffcf8ec26	powerpc: Fix little endian lppaca, slb_shadow and dtl_entry The lppaca, slb_shadow and dtl_entry hypervisor structures are big endian, so we have to byte swap them in little endian builds. LE KVM hosts will also need to be fixed but for now add an #error to remind us. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-08-14 15:33:35 +10:00
Michael Neuling	70a54a4fae	powerpc: Fix single step emulation of 32bit overflowed branches Check truncate_if_32bit() on final write to nip. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-20 16:55:13 +10:00
Michael Neuling	280a5ba22c	powerpc/pseries: Improve stream generation comments in copypage/user No code changes, just documenting what's happening a little better. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-06-01 08:29:26 +10:00
Suzuki K. Poulose	5e249d4528	uprobes/powerpc: Add dependency on single step emulation Uprobes uses emulate_step in sstep.c, but we haven't explicitly specified the dependency. On pseries HAVE_HW_BREAKPOINT protects us, but 44x has no such luxury. Consolidate other users that depend on sstep and create a new config option. Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com> Cc: linuxppc-dev@ozlabs.org Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-01-29 11:35:06 +11:00
Anton Blanchard	1fbe9cf259	powerpc: Build kernel with -mcmodel=medium Finally remove the two level TOC and build with -mcmodel=medium. Unfortunately we can't build modules with -mcmodel=medium due to the tricks the kernel module loader plays with percpu data: # -mcmodel=medium breaks modules because it uses 32bit offsets from # the TOC pointer to create pointers where possible. Pointers into the # percpu data area are created by this method. # # The kernel module loader relocates the percpu data section from the # original location (starting with 0xd...) to somewhere in the base # kernel percpu data space (starting with 0xc...). We need a full # 64bit relocation for this to work, hence -mcmodel=large. On older kernels we fall back to the two level TOC (-mminimal-toc) Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2013-01-10 17:00:31 +11:00
Nishanth Aravamudan	c8adfeccee	powerpc: Fix VMX fix for memcpy case In `2fae7cdb60` ("powerpc: Fix VMX in interrupt check in POWER7 copy loops"), Anton inadvertently introduced a regression for memcpy on POWER7 machines. copyuser and memcpy diverge slightly in their use of cr1 (copyuser doesn't use it, but memcpy does) and you end up clobbering that register with your fix. That results in (taken from an FC18 kernel): [ 18.824604] Unrecoverable VMX/Altivec Unavailable Exception f20 at c000000000052f40 [ 18.824618] Oops: Unrecoverable VMX/Altivec Unavailable Exception, sig: 6 [#1] [ 18.824623] SMP NR_CPUS=1024 NUMA pSeries [ 18.824633] Modules linked in: tg3(+) be2net(+) cxgb4(+) ipr(+) sunrpc xts lrw gf128mul dm_crypt dm_round_robin dm_multipath linear raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua squashfs cramfs [ 18.824705] NIP: c000000000052f40 LR: c00000000020b874 CTR: 0000000000000512 [ 18.824709] REGS: c000001f1fef7790 TRAP: 0f20 Not tainted (3.6.0-0.rc6.git0.2.fc18.ppc64) [ 18.824713] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 4802802e XER: 20000010 [ 18.824726] SOFTE: 0 [ 18.824728] CFAR: 0000000000000f20 [ 18.824731] TASK = c000000fa7128400[0] 'swapper/24' THREAD: c000000fa7480000 CPU: 24 GPR00: 00000000ffffffc0 c000001f1fef7a10 c00000000164edc0 c000000f9b9a8120 GPR04: c000000f9b9a8124 0000000000001438 0000000000000060 03ffffff064657ee GPR08: 0000000080000000 0000000000000010 0000000000000020 0000000000000030 GPR12: 0000000028028022 c00000000ff25400 0000000000000001 0000000000000000 GPR16: 0000000000000000 7fffffffffffffff c0000000016b2180 c00000000156a500 GPR20: c000000f968c7a90 c0000000131c31d8 c000001f1fef4000 c000000001561d00 GPR24: 000000000000000a 0000000000000000 0000000000000001 0000000000000012 GPR28: c000000fa5c04f80 00000000000008bc c0000000015c0a28 000000000000022e [ 18.824792] NIP [c000000000052f40] .memcpy_power7+0x5a0/0x7c4 [ 18.824797] LR [c00000000020b874] .pcpu_free_area+0x174/0x2d0 [ 18.824800] Call Trace: [ 18.824803] [c000001f1fef7a10] [c000000000052c14] .memcpy_power7+0x274/0x7c4 (unreliable) [ 18.824809] [c000001f1fef7b10] [c00000000020b874] .pcpu_free_area+0x174/0x2d0 [ 18.824813] [c000001f1fef7bb0] [c00000000020ba88] .free_percpu+0xb8/0x1b0 [ 18.824819] [c000001f1fef7c50] [c00000000043d144] .throtl_pd_exit+0x94/0xd0 [ 18.824824] [c000001f1fef7cf0] [c00000000043acf8] .blkg_free+0x88/0xe0 [ 18.824829] [c000001f1fef7d90] [c00000000018c048] .rcu_process_callbacks+0x2e8/0x8a0 [ 18.824835] [c000001f1fef7e90] [c0000000000a8ce8] .__do_softirq+0x158/0x4d0 [ 18.824840] [c000001f1fef7f90] [c000000000025ecc] .call_do_softirq+0x14/0x24 [ 18.824845] [c000000fa7483650] [c000000000010e80] .do_softirq+0x160/0x1a0 [ 18.824850] [c000000fa74836f0] [c0000000000a94a4] .irq_exit+0xf4/0x120 [ 18.824854] [c000000fa7483780] [c000000000020c44] .timer_interrupt+0x154/0x4d0 [ 18.824859] [c000000fa7483830] [c000000000003be0] decrementer_common+0x160/0x180 [ 18.824866] --- Exception: 901 at .plpar_hcall_norets+0x84/0xd4 [ 18.824866] LR = .check_and_cede_processor+0x48/0x80 [ 18.824871] [c000000fa7483b20] [c00000000007f018] .check_and_cede_processor+0x18/0x80 (unreliable) [ 18.824877] [c000000fa7483b90] [c00000000007f104] .dedicated_cede_loop+0x84/0x150 [ 18.824883] [c000000fa7483c50] [c0000000006bc030] .cpuidle_enter+0x30/0x50 [ 18.824887] [c000000fa7483cc0] [c0000000006bc9f4] .cpuidle_idle_call+0x104/0x720 [ 18.824892] [c000000fa7483d80] [c000000000070af8] .pSeries_idle+0x18/0x40 [ 18.824897] [c000000fa7483df0] [c000000000019084] .cpu_idle+0x1a4/0x380 [ 18.824902] [c000000fa7483ec0] [c0000000008a4c18] .start_secondary+0x520/0x528 [ 18.824907] [c000000fa7483f90] [c0000000000093f0] .start_secondary_prolog+0x10/0x14 [ 18.824911] Instruction dump: [ 18.824914] 38840008 90030000 90e30004 38630008 7ca62850 7cc300d0 78c7e102 7cf01120 [ 18.824923] 78c60660 39200010 39400020 39600030 <7e00200c> 7c0020ce 38840010 409f001c [ 18.824935] ---[ end trace 0bb95124affaaa45 ]--- [ 18.825046] Unrecoverable VMX/Altivec Unavailable Exception f20 at c000000000052d08 I believe the right fix is to make memcpy match usercopy and not use cr1. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@kernel.org> [v3.6]	2012-10-04 18:02:43 +10:00
Tiejun Chen	8e9f693715	powerpc/kprobe: Don't emulate store when kprobe stwu r1 We don't do the real store operation for kprobing 'stwu Rx,(y)R1' since this may corrupt the exception frame, now we will do this operation safely in exception return code after migrate current exception frame below the kprobed function stack. So we only update gpr[1] here and trigger a thread flag to mask this. Note we should make sure if we trigger kernel stack over flow. Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-18 15:32:45 +10:00
Benjamin Herrenschmidt	636802ef96	powerpc: Don't use __put_user() in patch_instruction patch_instruction() can be called very early on ppc32, when the kernel isn't yet running at it's linked address. That can cause the ! is_kernel_addr() test in __put_user() to trip and call might_sleep() which is very bad at that point during boot. Use a lower level function instead for now, at least until we get to rework ppc32 boot process to do the code patching later, like ppc64 does. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-05 16:05:23 +10:00
Anton Blanchard	2fae7cdb60	powerpc: Fix VMX in interrupt check in POWER7 copy loops The enhanced prefetch hint patches corrupt the condition register that was used to check if we are in interrupt. Fix this by using cr1. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-08-24 20:26:09 +10:00
Anton Blanchard	dad477ccd6	powerpc: POWER7 copy_to_user/copy_from_user patch applied twice "powerpc: Use enhanced touch instructions in POWER7 copy_to_user/copy_from_user" was applied twice. Remove one. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-08-24 20:26:09 +10:00
Stephen Rothwell	1d5a436d2c	powerpc: Put the gpr save/restore functions in their own section This allows the linker to know that calls to them do not need to switch TOC and stop errors like the following when linking large configurations: powerpc64-linux-ld: drivers/built-in.o: In function `.gpiochip_is_requested': (.text+0x4): sibling call optimization to `_savegpr0_29' does not allow automatic multiple TOCs; recompile with -mminimal-toc or -fno-optimize-sibling-calls, or make `_savegpr0_29' extern Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-11 14:19:59 +10:00
Michael Neuling	e55174e911	powerpc: Fixes for instructions not using correct register naming These macros are using integers where they could be using logical names since they take registers. We are going to enforce this soon, so fix these up now. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-10 19:18:16 +10:00
Michael Neuling	86e32fdce7	powerpc: Change mtcrf to use real register names mtocrf define is just a wrapper around the real instructions so we can just use real register names here (ie. lower case). Also remove braces in macro so this is possible. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-10 19:18:11 +10:00
Michael Neuling	44ce6a5ee7	powerpc: Merge STK_REG/PARAM/FRAMESIZE Merge the defines of STACKFRAMESIZE, STK_REG, STK_PARAM from different places. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-10 19:18:03 +10:00
Michael Neuling	c75df6f96c	powerpc: Fix usage of register macros getting ready for %r0 change Anything that uses a constructed instruction (ie. from ppc-opcode.h), need to use the new R0 macro, as %r0 is not going to work. Also convert usages of macros where we are just determining an offset (usually for a load/store), like: std r14,STK_REG(r14)(r1) Can't use STK_REG(r14) as %r14 doesn't work in the STK_REG macro since it's just calculating an offset. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-10 19:17:55 +10:00
Anton Blanchard	cf8fb5533f	powerpc: Optimise the 64bit optimised __clear_user I blame Mikey for this. He elevated my slightly dubious testcase: to benchmark status. And naturally we need to be number 1 at creating zeros. So lets improve __clear_user some more. As Paul suggests we can use dcbz for large lengths. This patch gets the destination cacheline aligned then uses dcbz on whole cachelines. Before: 10485760000 bytes (10 GB) copied, 0.414744 s, 25.3 GB/s After: 10485760000 bytes (10 GB) copied, 0.268597 s, 39.0 GB/s 39 GB/s, a new record. Signed-off-by: Anton Blanchard <anton@samba.org> Tested-by: Olof Johansson <olof@lixom.net> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-03 14:14:48 +10:00
Anton Blanchard	b3f271e86e	powerpc: POWER7 optimised memcpy using VMX and enhanced prefetch Implement a POWER7 optimised memcpy using VMX and enhanced prefetch instructions. This is a copy of the POWER7 optimised copy_to_user/copy_from_user loop. Detailed implementation and performance details can be found in commit `a66086b819` (powerpc: POWER7 optimised copy_to_user/copy_from_user using VMX). I noticed memcpy issues when profiling a RAID6 workload: .memcpy .async_memcpy .async_copy_data .__raid_run_ops .handle_stripe .raid5d .md_thread I created a simplified testcase by building a RAID6 array with 4 1GB ramdisks (booting with brd.rd_size=1048576): # mdadm -CR -e 1.2 /dev/md0 --level=6 -n4 /dev/ram[0-3] I then timed how long it took to write to the entire array: # dd if=/dev/zero of=/dev/md0 bs=1M Before: 892 MB/s After: 999 MB/s A 12% improvement. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-03 14:14:46 +10:00
Anton Blanchard	bce4b4bd91	powerpc: Use enhanced touch instructions in POWER7 copy_to_user/copy_from_user Version 2.06 of the POWER ISA introduced enhanced touch instructions, allowing us to specify a number of attributes including the length of a stream. This patch adds a software stream for both loads and stores in the POWER7 copy_tofrom_user loop. Since the setup is quite complicated and we have to use an eieio to ensure correct ordering of the "GO" command we only do this for copies above 4kB. To quantify any performance improvements we need a working set bigger than the caches so we operate on a 1GB file: # dd if=/dev/zero of=/tmp/foo bs=1M count=1024 And we compare how fast we can read the file: # dd if=/tmp/foo of=/dev/null bs=1M before: 7.7 GB/s after: 9.6 GB/s A 25% improvement. The worst case for this patch will be a completely L1 cache contained copy of just over 4kB. We can test this with the copy_to_user testcase we used to tune copy_tofrom_user originally: http://ozlabs.org/~anton/junkcode/copy_to_user.c # time ./copy_to_user2 -l 4224 -i 10000000 before: 6.807 s after: 6.946 s A 2% slowdown, which seems reasonable considering our data is unlikely to be completely L1 contained. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-03 14:14:45 +10:00
Anton Blanchard	fde69282b7	powerpc: POWER7 optimised copy_page using VMX and enhanced prefetch Implement a POWER7 optimised copy_page using VMX and enhanced prefetch instructions. We use enhanced prefetch hints to prefetch both the load and store side. We copy a cacheline at a time and fall back to regular loads and stores if we are unable to use VMX (eg we are in an interrupt). The following microbenchmark was used to assess the impact of the patch: http://ozlabs.org/~anton/junkcode/page_fault_file.c We test MAP_PRIVATE page faults across a 1GB file, 100 times: # time ./page_fault_file -p -l 1G -i 100 Before: 22.25s After: 18.89s 17% faster Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-03 14:14:44 +10:00
Anton Blanchard	6f7839e542	powerpc: Rename copyuser_power7_vmx.c to vmx-helper.c Subsequent patches will add more VMX library functions and it makes sense to keep all the c-code helper functions in the one file. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-03 14:14:43 +10:00
Anton Blanchard	a9514dc69d	powerpc: Use enhanced touch instructions in POWER7 copy_to_user/copy_from_user Version 2.06 of the POWER ISA introduced enhanced touch instructions, allowing us to specify a number of attributes including the length of a stream. This patch adds a software stream for both loads and stores in the POWER7 copy_tofrom_user loop. Since the setup is quite complicated and we have to use an eieio to ensure correct ordering of the "GO" command we only do this for copies above 4kB. To quantify any performance improvements we need a working set bigger than the caches so we operate on a 1GB file: # dd if=/dev/zero of=/tmp/foo bs=1M count=1024 And we compare how fast we can read the file: # dd if=/tmp/foo of=/dev/null bs=1M before: 7.7 GB/s after: 9.6 GB/s A 25% improvement. The worst case for this patch will be a completely L1 cache contained copy of just over 4kB. We can test this with the copy_to_user testcase we used to tune copy_tofrom_user originally: http://ozlabs.org/~anton/junkcode/copy_to_user.c # time ./copy_to_user2 -l 4224 -i 10000000 before: 6.807 s after: 6.946 s A 2% slowdown, which seems reasonable considering our data is unlikely to be completely L1 contained. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-03 14:14:42 +10:00
Anton Blanchard	17968fbbd1	powerpc: 64bit optimised __clear_user I noticed __clear_user high up in a profile of one of my RAID stress tests. The testcase was doing a dd from /dev/zero which ends up calling __clear_user. __clear_user is basically a loop with a single 4 byte store which is horribly slow. We can do much better by aligning the desination and doing 32 bytes of 8 byte stores in a loop. The following testcase was used to verify the patch: http://ozlabs.org/~anton/junkcode/stress_clear_user.c To show the improvement in performance I ran a dd from /dev/zero to /dev/null on a POWER7 box: Before: # dd if=/dev/zero of=/dev/null bs=1M count=10000 10485760000 bytes (10 GB) copied, 3.72379 s, 2.8 GB/s After: # time dd if=/dev/zero of=/dev/null bs=1M count=10000 10485760000 bytes (10 GB) copied, 0.728318 s, 14.4 GB/s Over 5x faster. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-03 14:14:41 +10:00
Steven Rostedt	b6e3796834	powerpc: Have patch_instruction detect faults For ftrace to use the patch_instruction code, it needs to check for faults on write. Ftrace updates code all over the kernel, and we need to know if code is updated or not due to protections that are placed on some portions of the kernel. If ftrace does not detect a fault, it will error later on, and it will be much more difficult to find the problem. By changing patch_instruction() to detect faults, then ftrace will be able to make use of it too. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-07-03 14:14:38 +10:00
Paul Mackerras	1629372caa	powerpc: Use the new generic strncpy_from_user() and strnlen_user() This is much the same as for SPARC except that we can do the find_zero() function more efficiently using the count-leading-zeroes instructions. Tested on 32-bit and 64-bit PowerPC. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-27 21:00:07 -07:00
Anton Blanchard	694caf0255	powerpc: Remove CONFIG_POWER4_ONLY Remove CONFIG_POWER4_ONLY, the option is badly named and only does two things: - It wraps the MMU segment table code. With feature fixups there is little downside to compiling this in. - It uses the newer mtocrf instruction in various assembly functions. Instead of making this a compile option just do it at runtime via a feature fixup. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-04-30 15:37:26 +10:00
David Howells	ae3a197e3d	Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PowerPC. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> cc: linuxppc-dev@lists.ozlabs.org	2012-03-28 18:30:02 +01:00
Stephen Rothwell	f5339277eb	powerpc: Remove FW_FEATURE ISERIES from arch code This is no longer selectable, so just remove all the dependent code. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-21 11:16:11 +11:00
Anton Blanchard	a66086b819	powerpc: POWER7 optimised copy_to_user/copy_from_user using VMX Implement a POWER7 optimised copy_to_user/copy_from_user using VMX. For large aligned copies this new loop is over 10% faster, and for large unaligned copies it is over 200% faster. If we take a fault we fall back to the old version, this keeps things relatively simple and easy to verify. On POWER7 unaligned stores rarely slow down - they only flush when a store crosses a 4KB page boundary. Furthermore this flush is handled completely in hardware and should be 20-30 cycles. Unaligned loads on the other hand flush much more often - whenever crossing a 128 byte cache line, or a 32 byte sector if either sector is an L1 miss. Considering this information we really want to get the loads aligned and not worry about the alignment of the stores. Microbenchmarks confirm that this approach is much faster than the current unaligned copy loop that uses shifts and rotates to ensure both loads and stores are aligned. We also want to try and do the stores in cacheline aligned, cacheline sized chunks. If the store queue is unable to merge an entire cacheline of stores then the L2 cache will have to do a read/modify/write. Even worse, we will serialise this with the stores in the next iteration of the copy loop since both iterations hit the same cacheline. Based on this, the new loop does the following things: 1 - 127 bytes Get the source 8 byte aligned and use 8 byte loads and stores. Pretty boring and similar to how the current loop works. 128 - 4095 bytes Get the source 8 byte aligned and use 8 byte loads and stores, 1 cacheline at a time. We aren't doing the stores in cacheline aligned chunks so we will potentially serialise once per cacheline. Even so it is much better than the loop we have today. 4096 - bytes If both source and destination have the same alignment get them both 16 byte aligned, then get the destination cacheline aligned. Do cacheline sized loads and stores using VMX. If source and destination do not have the same alignment, we get the destination cacheline aligned, and use permute to do aligned loads. In both cases the VMX loop should be optimal - we always do aligned loads and stores and are always doing stores in cacheline aligned, cacheline sized chunks. To be able to use VMX we must be careful about interrupts and sleeping. We don't use the VMX loop when in an interrupt (which should be rare anyway) and we wrap the VMX loop in disable/enable_pagefault and fall back to the existing copy_tofrom_user loop if we do need to sleep. The VMX breakpoint of 4096 bytes was chosen using this microbenchmark: http://ozlabs.org/~anton/junkcode/copy_to_user.c Since we are using VMX and there is a cost to saving and restoring the user VMX state there are two broad cases we need to benchmark: - Best case - userspace never uses VMX - Worst case - userspace always uses VMX In reality a userspace process will sit somewhere between these two extremes. Since we need to test both aligned and unaligned copies we end up with 4 combinations. The point at which the VMX loop begins to win is: 0% VMX aligned 2048 bytes unaligned 2048 bytes 100% VMX aligned 16384 bytes unaligned 8192 bytes Considering this is a microbenchmark, the data is hot in cache and the VMX loop has better store queue merging properties we set the breakpoint to 4096 bytes, a little below the unaligned breakpoints. Some future optimisations we can look at: - Looking at the perf data, a significant part of the cost when a task is always using VMX is the extra exception we take to restore the VMX state. As such we should do something similar to the x86 optimisation that restores FPU state for heavy users. ie: /* * If the task has used fpu the last 5 timeslices, just do a full * restore of the math state immediately to avoid the trap; the * chances of needing FPU soon are obviously high now / preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5; and / * fpu_counter contains the number of consecutive context switches * that the FPU is used. If this is over a threshold, the lazy fpu * saving becomes unlazy to save the trap. This is an unsigned char * so that after 256 times the counter wraps and the behavior turns * lazy again; this to deal with bursty apps that only use FPU for * a short time */ - We could create a paca bit to mirror the VMX enabled MSR bit and check that first, avoiding multiple calls to calling enable_kernel_altivec. That should help with iovec based system calls like readv. - We could have two VMX breakpoints, one for when we know the user VMX state is loaded into the registers and one when it isn't. This could be a second bit in the paca so we can calculate the break points quickly. - One suggestion from Ben was to save and restore the VSX registers we use inline instead of using enable_kernel_altivec. [BenH: Fixed a problem with preempt and fixed build without CONFIG_ALTIVEC] Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-12-19 14:40:40 +11:00
Anton Blanchard	d715e433b7	powerpc: Copy down exception vectors after feature fixups kdump fails because we try to execute an HV only instruction. Feature fixups are being applied after we copy the exception vectors down to 0 so they miss out on any updates. We have always had this issue but it only became critical in v3.0 when we added CFAR support (breaks POWER5) and v3.1 when we added POWERNV (breaks everyone). Signed-off-by: Anton Blanchard <anton@samba.org> Cc: <stable@kernel.org> [v3.0+] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-11-16 14:47:54 +11:00
Paul Gortmaker	4b16f8e2d6	powerpc: various straight conversions from module.h --> export.h All these files were including module.h just for the basic EXPORT_SYMBOL infrastructure. We can shift them off to the export.h header which is a way smaller footprint and thus realize some compile time gains. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:30:44 -04:00
Linus Torvalds	82aff107f8	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (152 commits) powerpc: Fix hard CPU IDs detection powerpc/pmac: Update via-pmu to new syscore_ops powerpc/kvm: Fix the build for 32-bit Book 3S (classic) processors powerpc/kvm: Fix kvmppc_core_pending_dec powerpc: Remove last piece of GEMINI powerpc: Fix for Pegasos keyboard and mouse powerpc: Make early memory scan more resilient to out of order nodes powerpc/pseries/iommu: Cleanup ddw naming powerpc/pseries/iommu: Find windows after kexec during boot powerpc/pseries/iommu: Remove ddw property when destroying window powerpc/pseries/iommu: Add additional checks when changing iommu mask powerpc/pseries/iommu: Use correct return type in dupe_ddw_if_already_created powerpc: Remove unused/obsolete CONFIG_XICS misc: Add CARMA DATA-FPGA Programmer support misc: Add CARMA DATA-FPGA Access Driver powerpc: Make IRQ_NOREQUEST last to clear, first to set powerpc: Integrated Flash controller device tree bindings powerpc/85xx: Create dts of each core in CAMP mode for P1020RDB powerpc/85xx: Fix PCIe IDSEL for Px020RDB powerpc/85xx: P2020 DTS: re-organize dts files ...	2011-05-20 13:28:01 -07:00
Linus Torvalds	268bb0ce3e	sanitize <linux/prefetch.h> usage Commit `e66eed651f` ("list: remove prefetching from regular list iterators") removed the include of prefetch.h from list.h, which uncovered several cases that had apparently relied on that rather obscure header file dependency. So this fixes things up a bit, using grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw(' -- '.[ch]') grep -L 'prefetchw(' $(git grep -l 'linux/prefetch.h' -- '.[ch]') to guide us in finding files that either need <linux/prefetch.h> inclusion, or have it despite not needing it. There are more of them around (mostly network drivers), but this gets many core ones. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-20 12:50:29 -07:00
Milton Miller	a56555e573	powerpc: Remove alloc_maybe_bootmem for zalloc version Replace all remaining callers of alloc_maybe_bootmem with zalloc_maybe_bootmem. The callsite in pci_dn is followed with a memset to clear the memory, and not zeroing at the other callsites in the celleb fake pci code could lead to following uninitialized memory as pointers or even freeing said pointers on error paths. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-05-19 15:30:57 +10:00
Anton Blanchard	40f1ce7fb7	powerpc: Remove ioremap_flags We have a confusing number of ioremap functions. Make things just a bit simpler by merging ioremap_flags and ioremap_prot. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-05-19 14:30:43 +10:00
Anton Blanchard	d988f0e3f8	powerpc: Simplify 4k/64k copy_page logic To make it easier to add optimised versions of copy_page, remove the 4kB loop for 64kB pages and just do all the work in copy_page. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-05-19 14:30:42 +10:00
Michael Ellerman	b91e136cdf	powerpc: Use MSR_64BIT in sstep.c, fix kprobes on BOOK3E We check MSR_SF a lot in sstep.c, to decide if we need to emulate the truncation of values when running in 32-bit mode. Factor out that code into a helper, and convert it and the other uses to use MSR_64BIT. This fixes a bug on BOOK3E where kprobes would end up returning to a 32-bit address, because regs->nip was truncated, because (msr & MSR_SF) was false. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-04-27 14:18:46 +10:00
Michael Ellerman	c0337288ab	powerpc: Ensure the else case of feature sections will fit When we create an alternative feature section, the else case must be the same size or smaller than the body. This is because when we patch the else case in we just overwrite the body, so there must be room. Up to now we just did this by inspection, but it's quite easy to enforce it in the assembler, so we should. The only change is to add the ifgt block, but that effects the alignment of the tabs and so the whole macro is modified. Also add a test, but #if 0 it because we don't want to break the build. Anyone who's modifying the feature macros should enable the test. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-01-21 14:08:33 +11:00
Anton Blanchard	b5f9b6665b	powerpc: Hardcode popcnt instructions for old assemblers The popcnt instructions went into binutils relatively recently. As with a number of other instructions, create macros and hardcode them. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-12-09 15:35:30 +11:00
Anton Blanchard	64ff312876	powerpc: Add support for popcnt instructions POWER5 added popcntb, and POWER7 added popcntw and popcntd. As a first step this patch does all the work out of line, but it would be nice to implement them as inlines with an out of line fallback. The performance issue with hweight was noticed when disabling SMT on a large (192 thread) POWER7 box. The patch improves that testcase by about 8%. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-11-29 15:48:17 +11:00
matt mooney	4108d9ba90	powerpc/Makefiles: Change to new flag variables Replace EXTRA_CFLAGS with ccflags-y and EXTRA_AFLAGS with asflags-y. Signed-off-by: matt mooney <mfm@muteddisk.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-10-13 16:19:22 +11:00
Sean MacLennan	cd64d1697c	powerpc: mtmsrd not defined Replace the BOOK3S_64 specific mtmsrd with the generic MTMSRD macro. Only enable ldstfp when CONFIG_PPC_FPU is set. Signed-off-by: Sean MacLennan <smaclennan@pikatech.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-09-02 14:07:34 +10:00
Sean MacLennan	025c0186a0	powerpc: Fix incorrect .stabs entry for copy_32.S Signed-off-by: Sean MacLennan <smaclennan@pikatech.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-09-02 14:07:34 +10:00
Paul Mackerras	8154c5d22d	powerpc: Abstract indexing of lppaca structs Currently we have the lppaca structs as a simple array of NR_CPUS entries, taking up space in the data section of the kernel image. In future we would like to allocate them dynamically, so this abstracts out the accesses to the array, making it easier to change how we locate the lppaca for a given cpu in future. Specifically, lppaca[cpu] changes to lppaca_of(cpu). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-09-02 14:07:31 +10:00
Anton Blanchard	8c77391475	powerpc: Add 64bit csum_and_copy_to_user This adds the equivalent of csum_and_copy_from_user for the receive side so we can copy and checksum in one pass. It is modelled on the generic checksum routine. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-09-02 14:07:30 +10:00
Anton Blanchard	fdd374b62c	powerpc: Optimise 64bit csum_partial_copy_generic and add csum_and_copy_from_user We use the same core loop as the new csum_partial, adding in the stores and exception handling code. To keep things simple we do all the exception fixup in csum_and_copy_from_user. This wrapper function is modelled on the generic checksum code and is careful to always calculate a complete checksum even if we only copied part of the data to userspace. To test this I forced checksumming on over loopback and ran socklib (a simple TCP benchmark). On a POWER6 575 throughput improved by 19% with this patch. If I forced both the sender and receiver onto the same cpu (with the hope of shifting the benchmark from being cache bandwidth limited to cpu limited), adding this patch improved performance by 55% Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-09-02 14:07:30 +10:00
Anton Blanchard	9b83ecb0a3	powerpc: Optimise 64bit csum_partial The main loop of csum_partial runs very slowly on recent POWER CPUs. After some analysis on both POWER6 and POWER7 I came up with routine below. First we get the source aligned to a double word, ignoring any odd alignment to keep things simple. Then we do 64 bytes at a time, with an entry and exit limb of a further 64 bytes. On both POWER6 and POWER7 this should be as fast as we can go since we are limited by the latency of the adde instructions. To test this I forced checksumming on over loopback and ran socklib (a simple TCP benchmark). On a POWER6 575 throughput improved by 11% with this patch. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-09-02 14:07:29 +10:00
Benjamin Herrenschmidt	5f07aa7524	Merge commit 'paulus-perf/master' into next	2010-07-09 11:25:48 +10:00
Stephen Rothwell	3880ecb05b	powerpc: Fix feature-fixup tests for gcc 4.5 The feature-fixup test declare some extern void variables and then take their addresses. Fix this by declaring them as extern u8 instead. Fixes these warnings (treated as errors): CC arch/powerpc/lib/feature-fixups.o cc1: warnings being treated as errors arch/powerpc/lib/feature-fixups.c: In function 'test_cpu_macros': arch/powerpc/lib/feature-fixups.c:293:23: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:294:9: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:297:2: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:297:2: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c: In function 'test_fw_macros': arch/powerpc/lib/feature-fixups.c:306:23: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:307:9: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:310:2: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:310:2: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c: In function 'test_lwsync_macros': arch/powerpc/lib/feature-fixups.c:321:23: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:322:9: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:326:3: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:326:3: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:329:3: error: taking address of expression of type 'void' arch/powerpc/lib/feature-fixups.c:329:3: error: taking address of expression of type 'void' Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-08 18:11:41 +10:00
Stephen Rothwell	7fca5dc8aa	powerpc: Fix module building for gcc 4.5 and 64 bit Gcc 4.5 is now generating out of line register save and restore in the function prefix and postfix when we use -Os. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-08 18:11:38 +10:00
K.Prasad	5aae8a5370	powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors Implement perf-events based hw-breakpoint interfaces for PowerPC 64-bit server (Book III S) processors. This allows access to a given location to be used as an event that can be counted or profiled by the perf_events subsystem. This is done using the DABR (data breakpoint register), which can also be used for process debugging via ptrace. When perf_event hw_breakpoint support is configured in, the perf_event subsystem manages the DABR and arbitrates access to it, and ptrace then creates a perf_event when it is requested to set a data breakpoint. [Adopted suggestions from Paul Mackerras <paulus@samba.org> to - emulate_step() all system-wide breakpoints and single-step only the per-task breakpoints - perform arch-specific cleanup before unregistration through arch_unregister_hw_breakpoint() ] Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2010-06-22 19:40:50 +10:00
Paul Mackerras	0016a4cf55	powerpc: Emulate most Book I instructions in emulate_step() This extends the emulate_step() function to handle a large proportion of the Book I instructions implemented on current 64-bit server processors. The aim is to handle all the load and store instructions used in the kernel, plus all of the instructions that appear between l[wd]arx and st[wd]cx., so this handles the Altivec/VMX lvx and stvx and the VSX lxv2dx and stxv2dx instructions (implemented in POWER7). The new code can emulate user mode instructions, and checks the effective address for a load or store if the saved state is for user mode. It doesn't handle little-endian mode at present. For floating-point, Altivec/VMX and VSX instructions, it checks that the saved MSR has the enable bit for the relevant facility set, and if so, assumes that the FP/VMX/VSX registers contain valid state, and does loads or stores directly to/from the FP/VMX/VSX registers, using assembly helpers in ldstfp.S. Instructions supported now include: * Loads and stores, including some but not all VMX and VSX instructions, and lmw/stmw * Atomic loads and stores (l[dw]arx, st[dw]cx.) * Arithmetic instructions (add, subtract, multiply, divide, etc.) * Compare instructions * Rotate and mask instructions * Shift instructions * Logical instructions (and, or, xor, etc.) * Condition register logical instructions * mtcrf, cntlz[wd], exts[bhw] * isync, sync, lwsync, ptesync, eieio * Cache operations (dcbf, dcbst, dcbt, dcbtst) The overflow-checking arithmetic instructions are not included, but they appear not to be ever used in C code. This uses decimal values for the minor opcodes in the switch statements because that is what appears in the Power ISA specification, thus it is easier to check that they are correct if they are in decimal. If this is used to single-step an instruction where a data breakpoint interrupt occurred, then there is the possibility that the instruction is a lwarx or ldarx. In that case we have to be careful not to lose the reservation until we get to the matching st[wd]cx., or we'll never make forward progress. One alternative is to try to arrange that we can return from interrupts and handle data breakpoint interrupts without losing the reservation, which means not using any spinlocks, mutexes, or atomic ops (including bitops). That seems rather fragile. The other alternative is to emulate the larx/stcx and all the instructions in between. This is why this commit adds support for a wide range of integer instructions. Signed-off-by: Paul Mackerras <paulus@samba.org>	2010-06-22 19:40:29 +10:00
Andreas Schwab	ca5d0674c3	powerpc: Fix string library functions The powerpc strncmp implementation does not correctly handle a zero length, despite the claim in `0119536cd3` (Add hand-coded assembly strcmp). Additionally, all the length arguments are size_t, not int, so use PPC_LCMPI and eq instead of cmpwi and le throughout. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-05-21 17:31:08 +10:00
Jeff Mahoney	637a99022f	powerpc: Fix handling of strncmp with zero len Commit `0119536c`, which added the assembly version of strncmp to powerpc, mentions that it adds two instructions to the version from boot/string.S to allow it to handle len=0. Unfortunately, it doesn't always return 0 when that is the case. The length is passed in r5, but the return value is passed back in r3. In certain cases, this will happen to work. Otherwise it will pass back the address of the first string as the return value. This patch lifts the len <= 0 handling code from memcpy to handle that case. Reported by: Christian_Sellars@symantec.com Signed-off-by: Jeff Mahoney <jeffm@suse.com> CC: <stable@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-04-07 18:00:39 +10:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Benjamin Herrenschmidt	3d98ffbffb	powerpc: Fix lwsync feature fixup vs. modules on 64-bit Anton's commit enabling the use of the lwsync fixup mechanism on 64-bit breaks modules. The lwsync fixup section uses .long instead of the FTR_ENTRY_OFFSET macro used by other fixups sections, and thus will generate 32-bit relocations that our module loader cannot resolve. This changes it to use the same type as other feature sections. Note however that we might want to consider using 32-bit for all the feature fixup offsets and add support for R_PPC_REL32 to module_64.c instead as that would reduce the size of the kernel image. I'll leave that as an exercise for the reader for now... Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-26 18:29:17 +11:00
Anton Blanchard	789c299ca2	powerpc: Improve 64bit copy_tofrom_user Here is a patch from Paul Mackerras that improves the ppc64 copy_tofrom_user. The loop now does 32 bytes at a time and as well as pairing loads and stores. A quick test case that reads 8kB over and over shows the improvement: POWER6: 53% faster POWER7: 51% faster #define _XOPEN_SOURCE 500 #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <fcntl.h> #include <sys/types.h> #include <sys/stat.h> #define BUFSIZE (8 * 1024) #define ITERATIONS 10000000 int main() { char tmpfile[] = "/tmp/copy_to_user_testXXXXXX"; int fd; char *buf[BUFSIZE]; unsigned long i; fd = mkstemp(tmpfile); if (fd < 0) { perror("open"); exit(1); } if (write(fd, buf, BUFSIZE) != BUFSIZE) { perror("open"); exit(1); } for (i = 0; i < 10000000; i++) { if (pread(fd, buf, BUFSIZE, 0) != BUFSIZE) { perror("pread"); exit(1); } } unlink(tmpfile); return 0; } Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-17 14:03:16 +11:00
Anton Blanchard	63e6c5b810	powerpc: Pair loads and stores in copy_4k_page A number of our chips like loads and stores to be paired. A small kernel module testcase shows the improvement of pairing loads and stores in copy_4k_page: POWER6: +9% POWER7: +1.5% #include <linux/module.h> #include <linux/mm.h> #define ITERATIONS 10000000 static int __init copypage_init(void) { struct timespec before, after; unsigned long i; struct page destpage, srcpage; char dest, src; destpage = alloc_page(GFP_KERNEL); srcpage = alloc_page(GFP_KERNEL); dest = page_address(destpage); src = page_address(srcpage); getnstimeofday(&before); for (i = 0; i < ITERATIONS; i++) copy_4K_page(dest, src); getnstimeofday(&after); free_page((unsigned long)dest); free_page((unsigned long)src); printk(KERN_DEBUG "copy_4K_page loop took %lu ns\n", (after.tv_sec - before.tv_sec) * NSEC_PER_SEC + (after.tv_nsec - before.tv_nsec)); return 0; } static void __exit copypage_exit(void) { } module_init(copypage_init) module_exit(copypage_exit) MODULE_LICENSE("GPL"); MODULE_AUTHOR("Anton Blanchard"); Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-17 14:03:16 +11:00
Anton Blanchard	53eae2281a	powerpc: Fix lwsync patching code on 64bit do_lwsync_fixups doesn't work on 64bit, we end up writing lwsyncs to the wrong addresses: 0:mon> di c0000001000bfacc c0000001000bfacc 7c2004ac lwsync Since the lwsync section has negative offsets we need to use a signed int pointer so we sign extend the value. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-17 14:03:15 +11:00
Thomas Gleixner	fb3a6bbc91	locking: Convert raw_rwlock to arch_rwlock Not strictly necessary for -rt as -rt does not have non sleeping rwlocks, but it's odd to not have a consistent naming convention. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org	2009-12-14 23:55:32 +01:00
Thomas Gleixner	0199c4e68d	locking: Convert __raw_spin* functions to arch_spin* Name space cleanup. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org	2009-12-14 23:55:32 +01:00
Thomas Gleixner	445c89514b	locking: Convert raw_spinlock to arch_spinlock The raw_spin* namespace was taken by lockdep for the architecture specific implementations. raw_spin_* would be the ideal name space for the spinlocks which are not converted to sleeping locks in preempt-rt. Linus suggested to convert the raw_ to arch_ locks and cleanup the name space instead of using an artifical name like core_spin, atomic_spin or whatever No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org	2009-12-14 23:55:32 +01:00
Benjamin Herrenschmidt	bcd6acd51f	Merge commit 'origin/master' into next Conflicts: include/linux/kvm.h	2009-12-09 17:14:38 +11:00
Joakim Tjernlund	15d914d72a	powerpc/8xx: Start using dcbX instructions in various copy routines Now that 8xx can fixup dcbX instructions, start using them where possible like every other PowerPc arch do. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-12-09 17:10:37 +11:00
Anton Blanchard	3cd980dbc1	powerpc: perf_event: Cleanup copy_page output by hiding setup symbol A lot of hits in "setup" doesn't make much sense, so hide this symbol and allow all the hits to end up in copy_4k_page. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-10-28 16:13:05 +11:00
Michael Ellerman	ba55bd7436	powerpc: Add configurable -Werror for arch/powerpc Add the option to build the code under arch/powerpc with -Werror. The intention is to make it harder for people to inadvertantly introduce warnings in the arch/powerpc code. It needs to be configurable so that if a warning is introduced, people can easily work around it while it's being fixed. The option is a negative, ie. don't enable -Werror, so that it will be turned on for allyes and allmodconfig builds. The default is n, in the hope that developers will build with -Werror, that will probably lead to some build breaks, I am prepared to be flamed. It's not enabled for math-emu, which is a steaming pile of warnings. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-16 14:15:45 +10:00
Benjamin Herrenschmidt	b16e7766d6	powerpc: Move dma-noncoherent.c from arch/powerpc/lib to arch/powerpc/mm (pre-requisite to make the next patches more palatable) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-27 16:32:05 +10:00
Benjamin Herrenschmidt	84532a0fc3	Revert "powerpc: Rework dma-noncoherent to use generic vmalloc layer" This reverts commit `33f00dcedb`. While it was a good idea to try to use the mm/vmalloc.c allocator instead of our own (in fact, ours is itself a dup on an old variant of the vmalloc one), unfortunately, the approach is terminally busted since dma_alloc_coherent() can be called at interrupt time or in atomic contexts and there's little chances we'll make the code in mm/vmalloc.c cope with\ that :-( Until we can get the generic code to forbid that idiocy and fix all drivers abusing it, we pretty much have no choice but revert to our custom virtual space allocator. There's also a problem with SMP safety since freeing such mapping would require an IPI which cannot be done at interrupt time. However, right now, I don't think we support any platform that is both SMP and has non-coherent DMA (don't laugh, I know such things do exist !) so we can sort that out later. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-27 13:33:14 +10:00
Benjamin Herrenschmidt	e14eee56c2	Merge commit 'origin/master' into next	2009-03-11 17:10:07 +11:00
Mark Nelson	f72b728bf1	powerpc: Fix 64bit __copy_tofrom_user() regression This fixes a regression introduced by commit `a4e22f02f5` ("powerpc: Update 64bit __copy_tofrom_user() using CPU_FTR_UNALIGNED_LD_STD"). The same bug that existed in the 64bit memcpy() also exists here so fix it here too. The fix is the same as that applied to memcpy() with the addition of fixes for the exception handling code required for __copy_tofrom_user(). This stops us reading beyond the end of the source region we were told to copy. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-26 14:02:54 +11:00
Mark Nelson	e423b9ecd6	powerpc: Fix 64bit memcpy() regression This fixes a regression introduced by commit `25d6e2d7c5` ("powerpc: Update 64bit memcpy() using CPU_FTR_UNALIGNED_LD_STD"). This commit allowed CPUs that have the CPU_FTR_UNALIGNED_LD_STD CPU feature bit present to do the memcpy() with unaligned load doubles. But, along with this came a bug where our final load double would read bytes beyond a page boundary and into the next (unmapped) page. This was caught by enabling CONFIG_DEBUG_PAGEALLOC, The fix was to read only the number of bytes that we need to store rather than reading a full 8-byte doubleword and storing only a portion of that. In order to minimise the amount of existing code touched we use the original do_tail for the src_unaligned case. Below is an example of the regression, as reported by Sachin Sant: Unable to handle kernel paging request for data at address 0xc00000003f380000 Faulting instruction address: 0xc000000000039574 cpu 0x1: Vector: 300 (Data Access) at [c00000003baf3020] pc: c000000000039574: .memcpy+0x74/0x244 lr: d00000000244916c: .ext3_xattr_get+0x288/0x2f4 [ext3] sp: c00000003baf32a0 msr: 8000000000009032 dar: c00000003f380000 dsisr: 40000000 current = 0xc00000003e54b010 paca = 0xc000000000a53680 pid = 1840, comm = readahead enter ? for help [link register ] d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3] [c00000003baf32a0] d000000002449104 .ext3_xattr_get+0x220/0x2f4 [ext3] (unreliab le) [c00000003baf3390] d00000000244a6e8 .ext3_xattr_security_get+0x40/0x5c [ext3] [c00000003baf3400] c000000000148154 .generic_getxattr+0x74/0x9c [c00000003baf34a0] c000000000333400 .inode_doinit_with_dentry+0x1c4/0x678 [c00000003baf3560] c00000000032c6b0 .security_d_instantiate+0x50/0x68 [c00000003baf35e0] c00000000013c818 .d_instantiate+0x78/0x9c [c00000003baf3680] c00000000013ced0 .d_splice_alias+0xf0/0x120 [c00000003baf3720] d00000000243e05c .ext3_lookup+0xec/0x134 [ext3] [c00000003baf37c0] c000000000131e74 .do_lookup+0x110/0x260 [c00000003baf3880] c000000000134ed0 .__link_path_walk+0xa98/0x1010 [c00000003baf3970] c0000000001354a0 .path_walk+0x58/0xc4 [c00000003baf3a20] c000000000135720 .do_path_lookup+0x138/0x1e4 [c00000003baf3ad0] c00000000013645c .path_lookup_open+0x6c/0xc8 [c00000003baf3b70] c000000000136780 .do_filp_open+0xcc/0x874 [c00000003baf3d10] c0000000001251e0 .do_sys_open+0x80/0x140 [c00000003baf3dc0] c00000000016aaec .compat_sys_open+0x24/0x38 [c00000003baf3e30] c00000000000855c syscall_exit+0x0/0x40 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-26 14:02:53 +11:00
Ilya Yanok	33f00dcedb	powerpc: Rework dma-noncoherent to use generic vmalloc layer This patch rewrites consistent dma allocations support to use vmalloc layer to allocate virtual memory space from vmalloc pool and get rid of CONFIG_CONSISTENT_{START,SIZE}. This greatly simplifies the code by effectively removing a custom allocator we had for virtual space. Signed-off-by: Ilya Yanok <yanok@emcraft.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:57 +11:00
Kumar Gala	16c57b3620	powerpc: Unify opcode definitions and support Create a new header that becomes a single location for defining PowerPC opcodes used by code that is either generationg instructions at runtime (fixups, debug, etc.), emulating instructions, or just compiling instructions old assemblers don't know about. We currently don't handle the floating point emulation or alignment decode as both are better handled by the specific decode support they already have. Added support for the new dcbzl, dcbal, msgsnd, tlbilx, & wait instructions since older assemblers don't know about them. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:56 +11:00
Ananth N Mavinakayanahalli	eef336189b	powerpc: Don't emulate mr. instructions Currently emulate_step() emulates mr. instructions without updating cr0 and this can be disastrous. Don't emulate mr. This bug has been around for a while, but I am not sure if its a worthy -stable candidate. I'll leave it to Ben do decide. Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-10 14:39:07 +11:00
Linus Torvalds	3c92ec8ae9	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits) powerpc/44x: Support 16K/64K base page sizes on 44x powerpc: Force memory size to be a multiple of PAGE_SIZE powerpc/32: Wire up the trampoline code for kdump powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M powerpc/32: Allow __ioremap on RAM addresses for kdump kernel powerpc/32: Setup OF properties for kdump powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs() powerpc: Prepare xmon_save_regs for use with kdump powerpc: Remove default kexec/crash_kernel ops assignments powerpc: Make default kexec/crash_kernel ops implicit powerpc: Setup OF properties for ppc32 kexec powerpc/pseries: Fix cpu hotplug powerpc: Fix KVM build on ppc440 powerpc/cell: add QPACE as a separate Cell platform powerpc/cell: fix build breakage with CONFIG_SPUFS disabled powerpc/mpc5200: fix error paths in PSC UART probe function powerpc/mpc5200: add rts/cts handling in PSC UART driver powerpc/mpc5200: Make PSC UART driver update serial errors counters powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver ... Fix trivial conflict in drivers/char/Makefile as per Paul's directions	2008-12-28 16:54:33 -08:00
David Howells	8168b5400b	powerpc: Rename struct vm_region to avoid conflict with NOMMU Rename PowerPC's struct vm_region so that I can introduce my own global version for NOMMU. It's feasible that the PowerPC version may wish to use my global one instead. The NOMMU vm_region struct defines areas of the physical memory map that are under mmap. This may include chunks of RAM or regions of memory mapped devices, such as flash. It is also used to retain copies of file content so that shareable private memory mappings of files can be made. As such, it may be compatible with what is described in the banner comment for PowerPC's vm_region struct. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:14 +11:00
Ingo Molnar	30cd324e97	Merge branches 'tracing/ftrace', 'tracing/ring-buffer' and 'tracing/urgent' into tracing/core Conflicts: include/linux/ftrace.h	2008-12-19 09:42:40 +01:00
Paul Mackerras	c280266a32	Merge branch 'linux-2.6' into next	2008-12-18 11:06:12 +11:00
Guillaume Knispel	af4d364386	powerpc: Fix corruption error in rh_alloc_fixed() There is an error in rh_alloc_fixed() of the Remote Heap code: If there is at least one free block blk won't be NULL at the end of the search loop, so -ENOMEM won't be returned and the else branch of "if (bs == s \|\| be == e)" will be taken, corrupting the management structures. Signed-off-by: Guillaume Knispel <gknispel@proformatique.com> Acked-by: Timur Tabi <timur@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2008-12-17 10:06:14 -06:00
Steven Rostedt	f1eecf0e4f	powerpc/ppc32: static ftrace fixes for PPC32 Impact: fix for PowerPC 32 code There were some early init code that was not safe for static ftrace to boot on my PowerBook. This code must only use relative addressing, and static mcount performs a compare of the ftrace_trace_function pointer, and gets that with an absolute address. In the early init boot up code, this will cause a fault. This patch removes tracing from the files containing the offending functions. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-28 14:08:07 +01:00
Mark Nelson	a4e22f02f5	powerpc: Update 64bit __copy_tofrom_user() using CPU_FTR_UNALIGNED_LD_STD In exactly the same way that we updated memcpy() with new feature sections in commit `25d6e2d7c5` ("powerpc: Update 64bit memcpy() using CPU_FTR_UNALIGNED_LD_STD"), we do the same thing here for __copy_tofrom_user(). Once again this is purely a performance tweak for Cell and Power6 - this has no effect on all the other 64bit powerpc chips. We can make these same changes to __copy_tofrom_user() because the basic copy algorithm is the same as in memcpy() - this version just has all the exception handling logic needed when copying to or from userspace as well as a special case for copying whole 4K pages that are page aligned. CPU_FTR_UNALIGNED_LD_STD CPU was added in commit `4ec577a289` ("powerpc: Add new CPU feature: CPU_FTR_UNALIGNED_LD_STD"). We also make the same simple one line change from cmpldi r1,... to cmpldi cr1,... for consistency. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-19 16:04:54 +11:00
Hollis Blanchard	7526ff76f8	powerpc: Remove superfluous WARN_ON() from dma-noncoherent.c I can't tell why this WARN_ON exists, and there's no comment explaining it. Whether the pmd is present or not, pte_alloc_kernel() seems to handle both cases. Booting a 440 kernel with 64K PAGE_SIZE triggers the warning, but boot successfully completes and I see no problems beyond that. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-19 16:04:52 +11:00
Mark Nelson	25d6e2d7c5	powerpc: Update 64bit memcpy() using CPU_FTR_UNALIGNED_LD_STD Update memcpy() to add two new feature sections: one for aligning the destination before copying and one for copying using aligned load and store doubles. These new feature sections will only affect Power6 and Cell because the CPU feature bit was only added to these two processors. Power6 gets its best performance in memcpy() when aligning neither the source nor the destination, while Cell gets its best performance when just the destination is aligned. But in order to save on CPU feature bits we can use the previously added CPU_FTR_CP_USE_DCBTZ feature bit to differentiate between Power6 and Cell (because CPU_FTR_CP_USE_DCBTZ was added to Cell but not Power6). The first feature section acts to nop out the branch that takes us to the code that aligns us to an eight byte boundary for the destination. We only want to nop out this branch on Power6. So the ALT_FTR_SECTION_END() for this feature section creates a test mask of the two feature bits ORed together and provides an expected result of just CPU_FTR_UNALIGNED_LD_STD, thus we nop out the branch if we're on a CPU that has CPU_FTR_UNALIGNED_LD_STD set and CPU_FTR_CP_USE_DCBTZ unset. For the second feature section added, if we're on a CPU that has the CPU_FTR_UNALIGNED_LD_STD bit set then we don't want to do the copy with aligned loads and stores (and the appropriate shifting left and right instructions), so we want to nop out the branch to .Lsrc_unaligned. The andi. used for this branch is moved to just above the branch because this allows us to nop out both instructions with just one feature section which gives us better performance and doesn't hurt readability which two separate feature sections did. Moving the andi. to just above the branch doesn't have any noticeable negative effect on the remaining 64bit processors (the ones that didn't have this feature bit added). On Cell this simple modification results in an improvement to measured memcpy() bandwidth of up to 50% in the hot cache case and up to 15% in the cold cache case. On Power6 we get memory bandwidth results that are up to three times faster in the hot cache case and up to 50% faster in the cold cache case. Commit `2a9294369b` ("powerpc: Add new CPU feature: CPU_FTR_CP_USE_DCBTZ") was where CPU_FTR_CP_USE_DCBTZ was added. To say that Cell gets its best performance in memcpy() with just the destination aligned is true but only for the reason that the indirect shift and rotate instructions, sld and srd, are microcoded on Cell. This means that either the destination or the source can be aligned, but not both, and seeing as we get better performance with the destination aligned we choose this option. While we're at it make a one line change from cmpldi r1,... to cmpldi cr1,... for consistency. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-05 22:08:29 +11:00
Benjamin Herrenschmidt	8aa2659009	powerpc: Fix DMA offset for non-coherent DMA After Becky's work we can almost have different DMA offsets between on-chip devices and PCI. Almost because there's a problem with the non-coherent DMA code that basically ignores the programmed offset to use the global one for everything. This fixes it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-10-14 10:35:26 +11:00
Mark Nelson	57dda6ef5b	powerpc: New copy_4K_page() This new copy_4K_page() function was originally tuned for the best performance on the Cell processor, but after testing on more 64bit powerpc chips it was found that with a small modification it either matched the performance offered by the current mainline version or bettered it by a small amount. It was found that on a Cell-based QS22 blade the amount of system time measured when compiling a 2.6.26 pseries_defconfig decreased by 4%. Using the same test, a 4-way 970MP machine saw a decrease of 2% in system time. No noticeable change was seen on Power4, Power5 or Power6. The 4096 byte page is copied in thirty-two 128 byte strides. An initial setup loop executes dcbt instructions for the whole source page and dcbz instructions for the whole destination page. To do this, the cache line size is retrieved from ppc64_caches. A new CPU feature bit, CPU_FTR_CP_USE_DCBTZ, (introduced in the previous patch) is used to make the modification to this new copy routine - on Power4, 970 and Cell the feature bit is set so the setup loop is executed, but on all other 64bit chips the setup loop is nop'ed out. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-09-15 11:07:42 -07:00
Kumar Gala	9c4cb82515	powerpc: Remove use of CONFIG_PPC_MERGE Now that arch/ppc is gone and CONFIG_PPC_MERGE is always set, remove the dead code associated with !CONFIG_PPC_MERGE from arch/powerpc and include/asm-powerpc. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-08-04 13:18:17 +10:00
Andrea Righi	27ac792ca0	PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit boundary. For example: u64 val = PAGE_ALIGN(size); always returns a value < 4GB even if size is greater than 4GB. The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for example): #define PAGE_SHIFT 12 #define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT) #define PAGE_MASK (~(PAGE_SIZE-1)) ... #define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK) The "~" is performed on a 32-bit value, so everything in "and" with PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary. Using the ALIGN() macro seems to be the right way, because it uses typeof(addr) for the mask. Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in include/linux/mm.h. See also lkml discussion: http://lkml.org/lkml/2008/6/11/237 [akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c] [akpm@linux-foundation.org: fix v850] [akpm@linux-foundation.org: fix powerpc] [akpm@linux-foundation.org: fix arm] [akpm@linux-foundation.org: fix mips] [akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c] [akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c] [akpm@linux-foundation.org: fix powerpc] Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:21 -07:00
Michael Ellerman	76bfdcf71c	powerpc: Use PPC_LONG and PPC_LONG_ALIGN in lib/string.S Replace ifdef clutter with the PPC_LONG and PPC_LONG_ALIGN macros for readability. No change to the generated code. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-07-22 10:39:35 +10:00
Michael Ellerman	1856c02040	powerpc: Use WARN_ON(1) instead of __WARN() __WARN() is not defined for all configs, use WARN_ON(1) instead. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-07-22 10:39:34 +10:00
Kumar Gala	2d1b202762	powerpc: Fixup lwsync at runtime To allow for a single kernel image on e500 v1/v2/mc we need to fixup lwsync at runtime. On e500v1/v2 lwsync causes an illop so we need to patch up the code. We default to 'sync' since that is always safe and if the cpu is capable we will replace 'sync' with 'lwsync'. We introduce CPU_FTR_LWSYNC as a way to determine at runtime if this is needed. This flag could be moved elsewhere since we dont really use it for the normal CPU_FTR purpose. Finally we only store the relative offset in the fixup section to keep it as small as possible rather than using a full fixup_entry. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-03 16:58:10 +10:00
Kumar Gala	5888da1876	powerpc: Fix building of feature-fixup tests on ppc32 We need to use PPC_LCMPI otherwise we get compile errors like: arch/powerpc/lib/feature-fixups-test.S: Assembler messages: arch/powerpc/lib/feature-fixups-test.S:142: Error: Unrecognized opcode: `cmpdi' arch/powerpc/lib/feature-fixups-test.S:149: Error: Unrecognized opcode: `cmpdi' arch/powerpc/lib/feature-fixups-test.S:164: Error: Unrecognized opcode: `cmpdi' Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-03 16:58:09 +10:00
Andrew Lewis	03d70617b8	powerpc: Prevent memory corruption due to cache invalidation of unaligned DMA buffer On PowerPC processors with non-coherent cache architectures the DMA subsystem calls invalidate_dcache_range() before performing a DMA read operation. If the address and length of the DMA buffer are not aligned to a cache-line boundary this can result in memory outside of the DMA buffer being invalidated in the cache. If this memory has an uncommitted store then the data will be lost and a subsequent read of that address will result in an old value being returned from main memory. Only when the DMA buffer starts on a cache-line boundary and is an exact mutiple of the cache-line size can invalidate_dcache_range() be called, otherwise flush_dcache_range() must be called. flush_dcache_range() will first flush uncommitted writes, and then invalidate the cache. Signed-off-by: Andrew Lewis <andrew-lewis at netspace.net.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-01 11:28:54 +10:00
Michael Ellerman	362e7701fd	powerpc: Add self-tests of the feature fixup code This commit adds tests of the feature fixup code, they are run during boot if CONFIG_FTR_FIXUP_SELFTEST=y. Some of the tests manually invoke the patching routines to check their behaviour, and others use the macros and so are patched during the normal patching done during boot. Because we have two sets of macros with different names, we use a macro to generate the test of the macros, very niiiice. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-01 11:28:30 +10:00
Michael Ellerman	9b1a735de6	powerpc: Add logic to patch alternative feature sections This commit adds the logic to patch alternative sections. This is fairly straightforward, except for branches. Relative branches that jump from inside the else section to outside of it need to be translated as they're moved, otherwise they will jump to the wrong location. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-01 11:28:29 +10:00
Michael Ellerman	fac23fe4be	powerpc: Introduce infrastructure for feature sections with alternatives The current feature section logic only supports nop'ing out code, this means if you want to choose at runtime between instruction sequences, one or both cases will have to execute the nop'ed out contents of the other section, eg: BEGIN_FTR_SECTION or 1,1,1 END_FTR_SECTION_IFSET(FOO) BEGIN_FTR_SECTION or 2,2,2 END_FTR_SECTION_IFCLR(FOO) and the resulting code will be either, or 1,1,1 nop or, nop or 2,2,2 For small code segments this is fine, but for larger code blocks and in performance criticial code segments, it would be nice to avoid the nops. This commit starts to implement logic to allow the following: BEGIN_FTR_SECTION or 1,1,1 FTR_SECTION_ELSE or 2,2,2 ALT_FTR_SECTION_END_IFSET(FOO) and the resulting code will be: or 1,1,1 or, or 2,2,2 We achieve this by extending the existing FTR macros. The current feature section semantic just becomes a special case, ie. if the else case is empty we nop out the default case. The key limitation is that the size of the else case must be less than or equal to the size of the default case. If the else case is smaller the remainder of the section is nop'ed. We let the linker put the else case code in with the rest of the text, so that relative branches from the else case are more likley to link, this has the disadvantage that we can't free the unused else cases. This commit introduces the required macro and linker script changes, but does not enable the patching of the alternative sections. We also need to update two hand-made section entries in reg.h and timex.h Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-01 11:28:28 +10:00
Michael Ellerman	51c52e8669	powerpc: Split out do_feature_fixups() from cputable.c The logic to patch CPU feature sections lives in cputable.c, but these days it's used for CPU features as well as firmware features. Move it into it's own file for neatness and as preparation for some additions. While we're moving the code, we pull the loop body logic into a separate routine, and remove a comment which doesn't apply anymore. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-01 11:28:24 +10:00
Michael Ellerman	ae0dc73625	powerpc: Add tests of the code patching routines Add tests of the existing code patching routines, as well as the new routines added in the last commit. The self-tests are run late in boot when CONFIG_CODE_PATCHING_SELFTEST=y, which depends on DEBUG_KERNEL=y. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-01 11:28:22 +10:00
Michael Ellerman	411781a290	powerpc: Add new code patching routines This commit adds some new routines for patching code, which will be used in a following commit. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-01 11:28:21 +10:00
Michael Ellerman	053a858efa	powerpc: Make create_branch() return errors if the branch target is too large If you pass a target value to create_branch() which is more than 32MB - 4, or - 32MB away from the branch site, then it's impossible to create an immediate branch. The current code doesn't check, which will lead to us creating a branch to somewhere else - which is bad. For code that cares to check we return 0, which is easy to check for, and for code that doesn't at least we'll be creating an illegal instruction, rather than a branch to some random address. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-01 11:28:19 +10:00
Michael Ellerman	e7a57273c6	powerpc: Allow create_branch() to return errors Currently create_branch() creates a branch instruction for you, and patches it into the call site. In some circumstances it would be nice to be able to create the instruction and patch it later, and also some code might want to check for errors in the branch creation before doing the patching. A future commit will change create_branch() to check for errors. For callers that don't care, replace create_branch() with patch_branch(), which just creates the branch and patches it directly. While we're touching all the callers, change to using unsigned int , as this seems to match usage better. That allows (and requires) us to remove the volatile in the definition of vector in powermac/smp.c and mpc86xx_smp.c, that's correct because now that we're passing vector as an unsigned int the compiler knows that it's value might change across the patch_branch() call. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Jon Loeliger <jdl@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-01 11:28:19 +10:00
Michael Ellerman	aaddd3eaca	powerpc: Move code patching code into arch/powerpc/lib/code-patching.c We currently have a few routines for patching code in asm/system.h, because they didn't fit anywhere else. I'd like to clean them up a little and add some more, so first move them into a dedicated C file - they don't need to be inlined. While we're moving the code, drop create_function_call(), it's intended caller never got merged and will be replaced in future with something different. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-01 11:28:18 +10:00
Kumar Gala	da3de6df33	[POWERPC] Fix -Os kernel builds with newer gcc versions GCC 4.4.x looks to be adding support for generating out-of-line register saves/restores based on: http://gcc.gnu.org/ml/gcc-patches/2008-04/msg01678.html This breaks the kernel if we enable CONFIG_CC_OPTIMIZE_FOR_SIZE. To fix this we add the use the save/restore code from gcc and simplified it down for our needs (integer only). Additionally, we have to link this code into each module. The other solution was to add EXPORT_SYMBOL() which meant going through the trampoline which seemed nonsensical for these out-of-line routines. Finally, we add some checks to prom_init_check.sh to ignore the out-of-line save/restore functions. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-06-16 15:00:54 +10:00
Paul Mackerras	0d4b6b901c	[POWERPC] ppc: More compile fixes This fixes a few more miscellaneous compile problems with ARCH=ppc. 1. Don't compile devres.c on ARCH=ppc, it doesn't have ioremap_flags. 2. Include <asm/irq.h> in setup.c for the __DO_IRQ_CANON definition. 3. Include <linux/proc_fs.h> in residual.c for the definition of create_proc_read_entry. 4. Fix xchg_ptr to be a static inline to eliminate a compiler warning. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-12 22:57:51 +10:00
Emil Medve	b41e5fffe8	[POWERPC] devres: Add devm_ioremap_prot() We provide an ioremap_flags, so this provides a corresponding devm_ioremap_prot. The slight name difference is at Ben Herrenschmidt's request as he plans on changing ioremap_flags to ioremap_prot in the future. Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Tejun Heo <htejun@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-05 16:47:14 +10:00
Timur Tabi	3a2f020c5a	[POWERPC] Make rheap safe for spinlocks The rheap allocation function, rh_alloc, could call kmalloc with GFP_KERNEL. This can sleep, which means you couldn't hold a spinlock while called rh_alloc. Change all kmalloc calls to use GFP_ATOMIC so that it won't sleep. This is safe because only small blocks are allocated. Signed-off-by: Timur Tabi <timur@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2008-04-17 09:50:38 -05:00
Steven Rostedt	0119536cd3	[POWERPC] Add hand-coded assembly strcmp We have an assembly version of strncmp for the bootwrapper, but not for the kernel, so we end up using the C version in the kernel. This takes the strncmp code from the bootup and copies it to the kernel proper, adding two instructions so it copes correctly with len==0. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-07 10:03:03 +10:00
Sylvain Munaut	1088a20998	[POWERPC] rheap: Changes config mechanism Instead of having in the makefile all the option that requires rheap, we define a configuration symbol and when needed we make sure it's selected. Signed-off-by: Sylvain Munaut <tnt@246tNt.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2007-10-16 17:09:21 -06:00
Sylvain Munaut	d4697af4f3	[POWERPC] exports rheap symbol to modules Theses can be useful in modules too. So we export them. Signed-off-by: Sylvain Munaut <tnt@246tNt.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2007-10-16 17:09:02 -06:00
Stephen Rothwell	5669c3cf19	[POWERPC] Limit range of __init_ref_ok somewhat This patch introduces zalloc_maybe_bootmem and uses it so that we don't have to mark a whole (largish) routine as __init_ref_ok. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-10-03 11:48:44 +10:00
Stephen Rothwell	2578bfae84	[POWERPC] Create and use CONFIG_WORD_SIZE Linus made this suggestion for the x86 merge and this starts the process for powerpc. We assume that CONFIG_PPC64 implies CONFIG_PPC_MERGE and CONFIG_PPC_STD_MMU_32 implies CONFIG_PPC_STD_MMU. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-10-03 09:12:02 +10:00
Stephen Rothwell	7b2c3c5b1d	[POWERPC] Fix section mismatch in PCI code Create a helper function (alloc_maybe_bootmem) that is marked __init_refok to limit the chances of mistakenly referring to other __init routines. WARNING: vmlinux.o(.text+0x2a9c4): Section mismatch: reference to .init.text:.__alloc_bootmem (between '.update_dn_pci_info' and '.pci_dn_reconfig_notifier') WARNING: vmlinux.o(.text+0x36430): Section mismatch: reference to .init.text:.__alloc_bootmem (between '.mpic_msi_init_allocator' and '.find_ht_magic_addr') WARNING: vmlinux.o(.text+0x5e804): Section mismatch: reference to .init.text:.__alloc_bootmem (between '.celleb_setup_phb' and '.celleb_fake_pci_write_config') WARNING: vmlinux.o(.text+0x5e8e8): Section mismatch: reference to .init.text:.__alloc_bootmem (between '.celleb_setup_phb' and '.celleb_fake_pci_write_config') WARNING: vmlinux.o(.text+0x5e968): Section mismatch: reference to .init.text:.__alloc_bootmem (between '.celleb_setup_phb' and '.celleb_fake_pci_write_config') Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-09-19 15:25:34 +10:00
Alexey Dobriyan	4e950f6f01	Remove fs.h from mm.h Remove fs.h from mm.h. For this, 1) Uninline vma_wants_writenotify(). It's pretty huge anyway. 2) Add back fs.h or less bloated headers (err.h) to files that need it. As result, on x86_64 allyesconfig, fs.h dependencies cut down from 3929 files rebuilt down to 3444 (-12.3%). Cross-compile tested without regressions on my two usual configs and (sigh): alpha arm-mx1ads mips-bigsur powerpc-ebony alpha-allnoconfig arm-neponset mips-capcella powerpc-g5 alpha-defconfig arm-netwinder mips-cobalt powerpc-holly alpha-up arm-netx mips-db1000 powerpc-iseries arm arm-ns9xxx mips-db1100 powerpc-linkstation arm-assabet arm-omap_h2_1610 mips-db1200 powerpc-lite5200 arm-at91rm9200dk arm-onearm mips-db1500 powerpc-maple arm-at91rm9200ek arm-picotux200 mips-db1550 powerpc-mpc7448_hpc2 arm-at91sam9260ek arm-pleb mips-ddb5477 powerpc-mpc8272_ads arm-at91sam9261ek arm-pnx4008 mips-decstation powerpc-mpc8313_rdb arm-at91sam9263ek arm-pxa255-idp mips-e55 powerpc-mpc832x_mds arm-at91sam9rlek arm-realview mips-emma2rh powerpc-mpc832x_rdb arm-ateb9200 arm-realview-smp mips-excite powerpc-mpc834x_itx arm-badge4 arm-rpc mips-fulong powerpc-mpc834x_itxgp arm-carmeva arm-s3c2410 mips-ip22 powerpc-mpc834x_mds arm-cerfcube arm-shannon mips-ip27 powerpc-mpc836x_mds arm-clps7500 arm-shark mips-ip32 powerpc-mpc8540_ads arm-collie arm-simpad mips-jazz powerpc-mpc8544_ds arm-corgi arm-spitz mips-jmr3927 powerpc-mpc8560_ads arm-csb337 arm-trizeps4 mips-malta powerpc-mpc8568mds arm-csb637 arm-versatile mips-mipssim powerpc-mpc85xx_cds arm-ebsa110 i386 mips-mpc30x powerpc-mpc8641_hpcn arm-edb7211 i386-allnoconfig mips-msp71xx powerpc-mpc866_ads arm-em_x270 i386-defconfig mips-ocelot powerpc-mpc885_ads arm-ep93xx i386-up mips-pb1100 powerpc-pasemi arm-footbridge ia64 mips-pb1500 powerpc-pmac32 arm-fortunet ia64-allnoconfig mips-pb1550 powerpc-ppc64 arm-h3600 ia64-bigsur mips-pnx8550-jbs powerpc-prpmc2800 arm-h7201 ia64-defconfig mips-pnx8550-stb810 powerpc-ps3 arm-h7202 ia64-gensparse mips-qemu powerpc-pseries arm-hackkit ia64-sim mips-rbhma4200 powerpc-up arm-integrator ia64-sn2 mips-rbhma4500 s390 arm-iop13xx ia64-tiger mips-rm200 s390-allnoconfig arm-iop32x ia64-up mips-sb1250-swarm s390-defconfig arm-iop33x ia64-zx1 mips-sead s390-up arm-ixp2000 m68k mips-tb0219 sparc arm-ixp23xx m68k-amiga mips-tb0226 sparc-allnoconfig arm-ixp4xx m68k-apollo mips-tb0287 sparc-defconfig arm-jornada720 m68k-atari mips-workpad sparc-up arm-kafa m68k-bvme6000 mips-wrppmc sparc64 arm-kb9202 m68k-hp300 mips-yosemite sparc64-allnoconfig arm-ks8695 m68k-mac parisc sparc64-defconfig arm-lart m68k-mvme147 parisc-allnoconfig sparc64-up arm-lpd270 m68k-mvme16x parisc-defconfig um-x86_64 arm-lpd7a400 m68k-q40 parisc-up x86_64 arm-lpd7a404 m68k-sun3 powerpc x86_64-allnoconfig arm-lubbock m68k-sun3x powerpc-cell x86_64-defconfig arm-lusl7200 mips powerpc-celleb x86_64-up arm-mainstone mips-atlas powerpc-chrp32 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-29 17:09:29 -07:00
Li Yang	7c8545e984	[POWERPC] rheap - eliminates internal fragments caused by alignment The patch adds fragments caused by rh_alloc_align() back to free list, instead of allocating the whole chunk of memory. This will greatly improve memory utilization managed by rheap. It solves MURAM not enough problem with 3 UCCs enabled on MPC8323. Signed-off-by: Li Yang <leoli@freescale.com> Acked-by: Joakim Tjernlund <joakim.tjernlund@transmode.se> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2007-06-19 22:35:53 -05:00
Timur Tabi	1c2de47cd4	[POWERPC] Fix alignment problem in rh_alloc_align() with exact-sized blocks When an rheap is created, the caller can specify the alignment to use. In rh_alloc_align(), if a free block is found that is the exact size needed (including extra space for alignment), that configured alignment value is not used to align the pointer. Instead, the default alignment is used. If the default alignment is smaller than the configured alignment, then the returned value will not be aligned correctly. Signed-off-by: Timur Tabi <timur@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2007-05-17 21:10:16 +10:00
Kumar Gala	b99ab6a8c7	[POWERPC] User rheap from arch/powerpc/lib Removed rheap in arch/ppc/lib and changed build system to use the one in arch/powerpc/lib. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2007-05-09 23:28:17 -05:00
Timur Tabi	4c35630ccd	[POWERPC] Change rheap functions to use ulongs instead of pointers The rheap allocation functions return a pointer, but the actual value is based on how the heap was initialized, and so it can be anything, e.g. an offset into a buffer. A ulong is a better representation of the value returned by the allocation functions. This patch changes all of the relevant rheap functions to use a unsigned long integers instead of a pointer. In case of an error, the value returned is a negative error code that has been cast to an unsigned long. The caller can use the IS_ERR_VALUE() macro to check for this. All code which calls the rheap functions is updated accordingly. Macros IS_MURAM_ERR() and IS_DPERR(), have been deleted in favor of IS_ERR_VALUE(). Also added error checking to rh_attach_region(). Signed-off-by: Timur Tabi <timur@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2007-05-09 23:01:43 -05:00
David Gibson	d1953c8888	[POWERPC] Remove use of 4level-fixup.h for ppc32 For 32-bit systems, powerpc still relies on the 4level-fixup.h hack, to pretend that the generic pagetable handling stuff is 3-levels rather than 4. This patch removes this, instead using the newer pgtable-nopmd.h to handle the elision of both the pud and pmd pagetable levels (ppc32 pagetables are actually 2 levels). This removes a little extraneous code, and makes it more easily compared to the 64-bit pagetable code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-08 13:40:31 +10:00
Paul Mackerras	49e1900d4c	Merge branch 'linux-2.6' into for-2.6.22	2007-04-30 12:38:01 +10:00
David S. Miller	ded220bd8f	[STRING]: Move strcasecmp/strncasecmp to lib/string.c We have several platforms using local copies of identical code. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 01:54:39 -07:00
Ananth N Mavinakayanahalli	6888199f7f	[POWERPC] Emulate more instructions in software Emulate a few more instructions in software - especially useful during singlestepping (xmon/kprobes). Instructions emulated with this patch are mfcr/mtcr rX, mfxer/mtxer rX, mflr/mtlr rX, mfctr/mtctr rX and mr rA,rB. Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-04-24 21:31:57 +10:00
Olof Johansson	3467bfd340	[POWERPC] Use mtocrf instruction in asm when CONFIG_POWER4_ONLY=y mtocrf is a faster single-field mtcrf (move to condition register fields) instruction available in POWER4 and later processors. It can make quite a difference in performance on some implementations, so use it for CONFIG_POWER4_ONLY builds. Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-04-13 03:55:13 +10:00
Stephen Rothwell	1c56f838a9	[POWERPC] Make ppc64_defconfig without CONFIG_PPC_PSERIES build Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-03-09 15:03:24 +11:00
Ahmed S. Darwish	3839a59439	[POWERPC] Use ARRAY_SIZE macro when appropriate Use ARRAY_SIZE macro already defined in linux/kernel.h Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-02-08 16:08:18 +11:00
Kumar Gala	8209003547	[POWERPC] Added kprobes support to ppc32 Added kprobes to ppc32 platforms that have use single_step_exception. This excludes 4xx and anything Book-E since their debug mechanisms for single stepping are completely different. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2007-02-06 22:55:19 -06:00
Timur Tabi	4942bd80e8	[POWERPC] Fix array indexing error in rheap grow() The grow() function in the rheap library allocates a larger array of blocks, copies the contents of the old blocks array to the newly allocated array and fixes the list_head pointers after the copy. At the end, the new blocks must be enqueued to the empty_list of the rh_info_t structure. This patch fixes a bug where the code was indexing past the end of the array when enqueueing blocks. The UCC ethernet driver, which uses the rheap allocator, experiences kernel panics because of this bug. Signed-off-by: Ionut Nicu <ionut.nicu@freescale.com> Signed-off-by: Timur Tabi <timur@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-02-07 14:03:19 +11:00
Vitaly Bordug	5902ebce22	[POWERPC] 8xx: generic 8xx code arch/powerpc port Including support for non-coherent cache, some mm-related things + relevant field in Kconfig and Makefiles. Also included rheap.o compilation if 8xx is defined. Non-coherent mapping were refined and renamed according to Cristoph Hellwig. Orphaned functions were cleaned up. [Also removed arch/ppc/kernel/dma-mapping.c, because otherwise compiling with ARCH=ppc for a non DMA-cache-coherent platform ends up with two copies of __dma_alloc_coherent etc. -- paulus.] Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-02-07 14:01:02 +11:00
Gui,Jian	0d69a052d4	[POWERPC] Disallow kprobes on emulate_step and branch_taken On powerpc, probing on emulate_step function will crash 2.6.18.1 when it is triggered. When kprobe is triggered, emulate_step() is on its kernel path and will cause recursive kprobe fault. And branch_taken() is called in emulate_step(). This disallows kprobes on both of them. Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-11-01 15:14:12 +11:00
Li Yang	5e98082358	[POWERPC] Fix rheap alignment problem Honor alignment parameter in the rheap allocator. This is needed by qe_lib. Remove compile warning. Signed-off-by: Pantelis Antoniou <pantelis@embeddedalley.com> Signed-off-by: Li Yang <leoli@freescale.com> Acked-by: Kumar Galak <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-10-02 20:27:47 +10:00
Paul Mackerras	4e6d816e51	Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/vitb/linux-2.6-PQ	2006-09-28 07:18:28 +10:00
Vitaly Bordug	b0c110b4f1	POWERPC: Move generic cpm2 stuff to powerpc This moves the cpm2 common code and PIC stuff to the powerpc. Most of the files were just copied from ppc/, with minor tuning to make it compile, and, subsequently, work. Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com>	2006-09-21 22:18:53 +04:00
Stephen Rothwell	4f896e53ee	[POWERPC] make spinlocks work in a combined kernel If we build a pSeries/iSeries combined kernel, we will need this. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>	2006-09-20 14:01:16 +10:00
Paul Mackerras	aa43f77939	Merge branch 'merge'	2006-08-31 15:45:48 +10:00
Paul Mackerras	d0027bf09f	[POWERPC] Fix return value from memcpy As pointed out by Herbert Xu <herbert@gondor.apana.org.au>, our memcpy implementation didn't return the destination pointer as its return value, and there is code in the kernel that expects that. This fixes it. Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-08-31 13:22:58 +10:00
Michael Ellerman	dac411e7aa	[POWERPC] iseries: Move e2a()/strne2a() into their only caller The ASCII -> EBCDIC functions, e2a() and strne2a() are now only used in dt.c, so move them in there. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>	2006-07-13 18:42:03 +10:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
Anton Blanchard	8555a0029b	[POWERPC] Optimise some TOC usage Micro-optimisation - add no-minimal-toc to some more arch/powerpc Makefiles. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-06-15 19:31:25 +10:00
Jon Mason	0a9cb46a73	[PATCH] remove powerpc bitops in favor of existing generic bitops There already exists a big endian safe bitops implementation in lib/find_next_bit.c. The code in it is 90%+ common with the powerpc specific version, so the powerpc version is redundant. This patch makes the necessary changes to use the generic bitops in powerpc, and removes the powerpc specific version. Signed-off-by: Jon Mason <jdmason@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-05-24 16:08:58 +10:00
Stephen Rothwell	af308377e2	[PATCH] powerpc: fix various sparse warnings Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-27 14:48:08 +11:00
Linus Torvalds	3cbb90a9cb	powerpc: fix strncasecmp prototype It takes a size_t, not an int, as its third argument. Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-03-25 09:41:40 -08:00
Michael Ellerman	584fc6d111	[PATCH] powerpc: Add strne2a() to convert a string from EBCDIC to ASCII Add strne2a() which converts a string from EBCDIC to ASCII. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-22 15:04:25 +11:00
Jon Mason	2ef9481e66	[PATCH] powerpc: trivial: modify comments to refer to new location of files This patch removes all self references and fixes references to files in the now defunct arch/ppc64 tree. I think this accomplises everything wanted, though there might be a few references I missed. Signed-off-by: Jon Mason <jdmason@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-02-10 16:53:51 +11:00
David Gibson	3356bb9f7b	[PATCH] powerpc: Remove lppaca structure from the PACA At present the lppaca - the structure shared with the iSeries hypervisor and phyp - is contained within the PACA, our own low-level per-cpu structure. This doesn't have to be so, the patch below removes it, making a separate array of lppaca structures. This saves approximately 500*NR_CPUS bytes of image size and kernel memory, because we don't need aligning gap between the Linux and hypervisor portions of every PACA. On the other hand it means an extra level of dereference in many accesses to the lppaca. The patch also gets rid of several places where we assign the paca address to a local variable for no particular reason. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-01-13 21:17:39 +11:00
Paul Mackerras	00557b59c6	powerpc: Fix find_next_bit on 32-bit We had a "64" that didn't get changed to BITS_PER_LONG, resulting in find_next_bit not working correctly. Signed-off-by: Paul Mackerras <paulus@samba.org>	2005-11-10 12:01:41 +11:00
Paul Mackerras	c613523455	Merge ../linux-2.6	2005-11-07 14:42:09 +11:00
Paul Mackerras	2249ca9d60	powerpc: Various UP build fixes Mostly this involves adding #include <asm/smp.h>, since that defines things like boot_cpuid[_phys] and [gs]et_hard_smp_processor_id, which are SMP-related but still needed on UP. This incorporates fixes posted by Olof Johansson and Heikki Lindholm. Signed-off-by: Paul Mackerras <paulus@samba.org>	2005-11-07 13:18:13 +11:00
Benjamin Herrenschmidt	3c726f8dee	[PATCH] ppc64: support 64k pages Adds a new CONFIG_PPC_64K_PAGES which, when enabled, changes the kernel base page size to 64K. The resulting kernel still boots on any hardware. On current machines with 4K pages support only, the kernel will maintain 16 "subpages" for each 64K page transparently. Note that while real 64K capable HW has been tested, the current patch will not enable it yet as such hardware is not released yet, and I'm still verifying with the firmware architects the proper to get the information from the newer hypervisors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-11-06 16:56:47 -08:00
Stephen Rothwell	aaf8a7a294	Merge iSeries include file move	2005-11-02 16:06:03 +11:00
David Gibson	a0e60b2033	[PATCH] powerpc: Merge bitops.h Here's a revised version. This re-introduces the set_bits() function from ppc64, which I removed because I thought it was unused (it exists on no other arch). In fact it is used in the powermac interrupt code (but not on pSeries). - We use LARXL/STCXL macros to generate the right (32 or 64 bit) instructions, similar to LDL/STL from ppc_asm.h, used in fpu.S - ppc32 previously used a full "sync" barrier at the end of test_and__bit(), whereas ppc64 used an "isync". The merged version uses "isync", since I believe that's sufficient. - The ppc64 versions of then minix_() bitmap functions have changed semantics. Previously on ppc64, these functions were big-endian (that is bit 0 was the LSB in the first 64-bit, big-endian word). On ppc32 (and x86, for that matter, they were little-endian. As far as I can tell, the big-endian usage was simply wrong - I guess no-one ever tried to use minixfs on ppc64. - On ppc32 find_next_bit() and find_next_zero_bit() are no longer inline (they were already out-of-line on ppc64). - For ppc64, sched_find_first_bit() has moved from mmu_context.h to the merged bitops. What it was doing in mmu_context.h in the first place, I have no idea. - The fls() function is now implemented using the cntlzw instruction on ppc64, instead of generic_fls(), as it already was on ppc32. - For ARCH=ppc, this patch requires adding arch/powerpc/lib to the arch/ppc/Makefile. This in turn requires some changes to arch/powerpc/lib/Makefile which didn't correctly handle ARCH=ppc. Built and running on G5. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2005-11-01 21:49:02 +11:00
Kelly Daly	1da4403788	merge filename and modify references to iseries/hv_call.h Signed-off-by: Kelly Daly <kelly@au.ibm.com>	2005-11-01 16:59:20 +11:00
Stephen Rothwell	5015b49448	powerpc: fix __strnlen_user in merge tree Change USER/KERNEL_DS so that the merged version of __strnlen_user can be used which allows us to complete the removal of arch/ppc64/lib/. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>	2005-11-01 14:34:17 +11:00

1 2 3 4 5 ...

269 Commits