linux_dsm_epyc7002/arch/powerpc/kernel
Anton Blanchard 7a0268fa1a [PATCH] powerpc/64: per cpu data optimisations
The current ppc64 per cpu data implementation is quite slow. eg:

        lhz 11,18(13)           /* smp_processor_id() */
        ld 9,.LC63-.LCTOC1(30)  /* per_cpu__variable_name */
        ld 8,.LC61-.LCTOC1(30)  /* __per_cpu_offset */
        sldi 11,11,3            /* form index into __per_cpu_offset */
        mr 10,9
        ldx 9,11,8              /* __per_cpu_offset[smp_processor_id()] */
        ldx 0,10,9              /* load per cpu data */

5 loads for something that is supposed to be fast, pretty awful. One
reason for the large number of loads is that we have to synthesize 2
64bit constants (per_cpu__variable_name and __per_cpu_offset).

By putting __per_cpu_offset into the paca we can avoid the 2 loads
associated with it:

        ld 11,56(13)            /* paca->data_offset */
        ld 9,.LC59-.LCTOC1(30)  /* per_cpu__variable_name */
        ldx 0,9,11              /* load per cpu data

Longer term we can should be able to do even better than 3 loads.
If per_cpu__variable_name wasnt a 64bit constant and paca->data_offset
was in a register we could cut it down to one load. A suggestion from
Rusty is to use gcc's __thread extension here. In order to do this we
would need to free up r13 (the __thread register and where the paca
currently is). So far Ive had a few unsuccessful attempts at doing that :)

The patch also allocates per cpu memory node local on NUMA machines.
This patch from Rusty has been sitting in my queue _forever_ but stalled
when I hit the compiler bug. Sorry about that.

Finally I also only allocate per cpu data for possible cpus, which comes
straight out of the x86-64 port. On a pseries kernel (with NR_CPUS == 128)
and 4 possible cpus we see some nice gains:

             total       used       free     shared    buffers cached
Mem:       4012228     212860    3799368          0          0 162424

             total       used       free     shared    buffers cached
Mem:       4016200     212984    3803216          0          0 162424

A saving of 3.75MB. Quite nice for smaller machines. Note: we now have
to be careful of per cpu users that touch data for !possible cpus.

At this stage it might be worth making the NUMA and possible cpu
optimisations generic, but per cpu init is done so early we have to be
careful that all architectures have their possible map setup correctly.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-11 14:49:45 +11:00
..
vdso32 [PATCH] powerpc: Make the vDSO functions set error code (#2) 2005-11-16 14:05:11 +11:00
vdso64 [PATCH] powerpc: Make the vDSO functions set error code (#2) 2005-11-16 14:05:11 +11:00
align.c [PATCH] powerpc: merge align.c 2005-11-18 14:39:23 +11:00
asm-offsets.c [PATCH] powerpc: Remove some unneeded fields from the paca 2006-01-09 14:50:35 +11:00
binfmt_elf32.c ppc64: merge binfmt_elf32.c 2005-10-13 13:40:54 +10:00
btext.c [PATCH] powerpc: Remove device_node addrs/n_addr 2006-01-09 14:53:55 +11:00
cpu_setup_power4.S [PATCH] powerpc: Move various ppc64 files with no ppc32 equivalent to powerpc 2005-11-10 11:24:04 +11:00
cputable.c [PATCH] ppc64: POWER5+ oprofile support 2006-01-09 16:03:30 +11:00
crash_dump.c [PATCH] powerpc: Add arch-dependent copy_oldmem_page 2006-01-09 14:52:35 +11:00
crash.c [PATCH] powerpc: Add arch dependent basic infrastructure for Kdump. 2006-01-09 14:52:28 +11:00
dma_64.c [PATCH] powerpc: IBMEBUS bus support 2006-01-09 14:49:06 +11:00
entry_32.S [PATCH] Fix code that saves NVGPRS in 32-bit signal frame 2006-01-09 14:50:48 +11:00
entry_64.S [PATCH] powerpc: Separate usage of KERNELBASE and PAGE_OFFSET 2006-01-09 14:51:54 +11:00
firmware.c [PATCH] powerpc: Move various ppc64 files with no ppc32 equivalent to powerpc 2005-11-10 11:24:04 +11:00
fpu.S [PATCH] powerpc: Consolidate asm compatibility macros 2005-11-10 13:10:38 +11:00
head_4xx.S powerpc: Rename asm offset TRAP to _TRAP for 32-bit 2005-10-28 22:45:25 +10:00
head_8xx.S powerpc: Rename asm offset TRAP to _TRAP for 32-bit 2005-10-28 22:45:25 +10:00
head_32.S powerpc: set CONFIG_PPC_OF=y always for ARCH=powerpc 2006-01-09 20:17:01 +11:00
head_44x.S [PATCH] powerpc: replace use of _GLOBAL with .globl 2005-10-17 21:43:12 +10:00
head_64.S powerpc: unbreak iSeries compilation again 2006-01-09 21:32:42 +11:00
head_fsl_booke.S [PATCH] Update email address for Kumar 2005-11-13 18:14:10 -08:00
ibmebus.c [PATCH] powerpc: IBMEBUS bus support 2006-01-09 14:49:06 +11:00
idle_6xx.S powerpc: Use reg.h instead of processor.h when we just want reg names 2005-10-10 22:20:10 +10:00
idle_64.c powerpc: Move remaining .c files from arch/ppc64 to arch/powerpc 2005-11-18 15:43:34 +11:00
idle_power4.S [PATCH] powerpc: Fix use of LOADBASE in merge tree 2005-10-17 21:43:12 +10:00
init_task.c powerpc: make process.c suitable for both 32-bit and 64-bit 2005-10-10 22:29:05 +10:00
iomap.c powerpc: Move most remaining ppc64 files over to arch/powerpc 2005-11-14 17:30:17 +11:00
iommu.c powerpc: Move most remaining ppc64 files over to arch/powerpc 2005-11-14 17:30:17 +11:00
irq.c powerpc: reduce include in irq.c 2006-01-09 14:50:15 +11:00
kprobes.c [PATCH] kprobes: fix build breakage 2006-01-10 08:01:40 -08:00
legacy_serial.c [PATCH] powerpc: fixing compile issue with !CONFIG_PCI in legacy_serial.c 2006-01-09 15:44:30 +11:00
lparcfg.c powerpc: iSeries build fixes 2005-11-14 17:14:51 +11:00
lparmap.c [PATCH] powerpc: Fix iSeries bug in VMALLOCBASE/VMALLOC_START consolidation 2006-01-09 15:06:06 +11:00
machine_kexec_32.c [PATCH] powerpc: Merge kexec 2006-01-09 14:48:52 +11:00
machine_kexec_64.c [PATCH] powerpc: Add arch dependent basic infrastructure for Kdump. 2006-01-09 14:52:28 +11:00
machine_kexec.c [PATCH] powerpc: remove remaining crash_notes variable from machine_kexec.c 2006-01-11 14:48:02 +11:00
Makefile Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge 2006-01-10 08:28:32 -08:00
misc_32.S [PATCH] powerpc: Merge kexec 2006-01-09 14:48:52 +11:00
misc_64.S ppc64: remove ppc_irq_dispatch_handler 2005-11-09 16:19:53 +11:00
module_64.c powerpc: Move most remaining ppc64 files over to arch/powerpc 2005-11-14 17:30:17 +11:00
nvram_64.c [PATCH] powerpc: fix large nvram access 2006-01-09 14:53:31 +11:00
of_device.c powerpc: apply recent changes to merged code 2005-10-31 13:57:01 +11:00
paca.c [PATCH] powerpc: Remove some unneeded fields from the paca 2006-01-09 14:50:35 +11:00
pci_64.c [PATCH] PCI Hotplug/powerpc: module build break 2006-01-11 14:47:30 +11:00
pci_direct_iommu.c powerpc: Move most remaining ppc64 files over to arch/powerpc 2005-11-14 17:30:17 +11:00
pci_dn.c powerpc: Move most remaining ppc64 files over to arch/powerpc 2005-11-14 17:30:17 +11:00
pci_iommu.c powerpc: Move most remaining ppc64 files over to arch/powerpc 2005-11-14 17:30:17 +11:00
pmc.c [PATCH] powerpc: G4+ oprofile support 2006-01-09 15:06:03 +11:00
ppc32.h powerpc: move include/asm-ppc64/ppc32.h to arch/powerpc/kernel 2005-11-03 16:03:28 +11:00
ppc_ksyms.c [PATCH] powerpc: Some ppc compile fixes... 2006-01-10 16:49:20 +11:00
proc_ppc64.c [PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel 2005-11-11 22:25:39 +11:00
process.c powerpc: Fix bug causing FP registers corruption on UP + preempt 2005-11-30 13:20:54 +11:00
prom_init.c [PATCH] powerpc: Remove device_node addrs/n_addr 2006-01-09 14:53:55 +11:00
prom_parse.c [PATCH] powerpc: pci_address_to_pio fix 2006-01-09 15:05:56 +11:00
prom.c spelling: s/retreive/retrieve/ 2006-01-10 00:10:13 +01:00
ptrace32.c [PATCH] use ptrace_get_task_struct in various places 2006-01-08 20:13:51 -08:00
ptrace-common.h powerpc: move include/asm-ppc64/ptrace-common.h to arch/powerpc/kernel 2005-11-19 20:47:22 +11:00
ptrace.c powerpc: move include/asm-ppc64/ptrace-common.h to arch/powerpc/kernel 2005-11-19 20:47:22 +11:00
rtas_flash.c powerpc: Merge remaining RTAS code 2005-11-03 14:41:19 +11:00
rtas_pci.c [PATCH] powerpc: Save device BARs much earlier in the boot sequence 2006-01-10 15:30:39 +11:00
rtas-proc.c [PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel 2005-11-11 22:25:39 +11:00
rtas-rtc.c powerpc: time-of-day fixes for 32-bit CHRP systems 2005-11-18 15:52:38 +11:00
rtas.c [PATCH] powerpc: Make early debugging configurable via Kconfig 2006-01-11 14:48:26 +11:00
semaphore.c powerpc: Merge enough to start building in arch/powerpc. 2005-09-26 16:04:21 +10:00
setup_32.c powerpc: Introduce a new config symbol to control 16550 early debug code 2006-01-10 16:19:05 +11:00
setup_64.c [PATCH] powerpc/64: per cpu data optimisations 2006-01-11 14:49:45 +11:00
setup-common.c [PATCH] powerpc: Add a is_kernel_addr() macro 2006-01-09 14:51:50 +11:00
setup.h powerpc: create kernel/setup.h 2005-11-09 11:35:26 +11:00
signal_32.c [PATCH] Save NVGPRS in 32-bit signal frame 2006-01-09 14:50:45 +11:00
signal_64.c [PATCH] syscall entry/exit revamp 2006-01-09 14:49:01 +11:00
smp-tbsync.c powerpc: Merge smp-tbsync.c (the generic timebase sync routine) 2005-11-04 13:28:58 +11:00
smp.c [PATCH] ppc64: Add NUMA cpu summary at boot 2006-01-09 14:53:37 +11:00
sys_ppc32.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge 2006-01-10 08:28:32 -08:00
syscalls.c [PATCH] ppc64: fix time syscall 2006-01-09 15:47:13 +11:00
sysfs.c [PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel 2005-11-11 22:25:39 +11:00
systbl.S Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge 2006-01-10 08:28:32 -08:00
time.c [PATCH] powerpc: Remove some unneeded fields from the paca 2006-01-09 14:50:35 +11:00
traps.c [PATCH] cell: enable pause(0) in cpu_idle 2006-01-09 15:44:32 +11:00
udbg_16550.c [PATCH] powerpc: Make early debugging configurable via Kconfig 2006-01-11 14:48:26 +11:00
udbg.c [PATCH] powerpc: Make early debugging configurable via Kconfig 2006-01-11 14:48:26 +11:00
vdso.c mm: re-architect the VM_UNPAGED logic 2005-11-28 14:34:23 -08:00
vecemu.c [PATCH] powerpc: Move arch/ppc*/kernel/vecemu.c to arch/powerpc 2005-09-21 19:21:07 +10:00
vector.S powerpc: Use reg.h instead of processor.h when we just want reg names 2005-10-10 22:20:10 +10:00
vio.c [PATCH] driver core: replace "hotplug" by "uevent" 2006-01-04 16:18:08 -08:00
vmlinux.lds.S powerpc: Fix vmlinux.lds.S for 32-bit 2005-11-05 10:36:59 +11:00