linux_dsm_epyc7002/arch/powerpc/kernel
Robert Jennings a90ab95a95 powerpc/pseries: vio bus support for CMO
This is a large patch but the normal code path is not affected.  For
non-pSeries platforms the code is ifdef'ed out and for non-CMO enabled
pSeries systems this does not affect the normal code path.  Devices that
do not perform DMA operations do not need modification with this patch.
The function get_desired_dma was renamed from get_io_entitlement for
clarity.

Overview

Cooperative Memory Overcommitment (CMO) allows for a set of OS partitions
to be run with less RAM than the aggregate needs of the group of
partitions.  The firmware will balance memory between the partitions
and page in/out memory as needed.  Based on the number and type of IO
adpaters preset each partition is allocated an amount of memory for
DMA operations and this allocation will be guaranteed to the partition;
this is referred to as the partition's 'entitlement'.

Partitions running in a CMO environment can only have virtual IO devices
present.  The VIO bus layer will manage the IO entitlement for the system.
Accounting, at a system and per-device level, is tracked in the VIO bus
code and exposed via sysfs.  A set of dma_ops functions are added to
the bus to allow for this accounting.

Bus initialization

At initialization, the bus will calculate the minimum needs of the system
based on providing each device present with a standard minimum entitlement
along with a spare allocation for the bus to handle hotplug events.
If the minimum needs can not be met the system boot will be halted.

Device changes

The significant changes for devices while running under CMO are that the
devices must specify how much dedicated IO entitlement they desire and
must also handle DMA mapping errors that can occur due to constrained
IO memory.  The virtual IO drivers are modified to silence errors when
DMA mappings fail for CMO and handle these failures gracefully.

Each devices will be guaranteed a minimum entitlement that can always
be mapped.  Devices will specify how much entitlement they desire and
the VIO bus will attempt to provide for this.  Devices can change their
desired entitlement level at any point in time to address particular needs
(via vio_cmo_set_dev_desired()), not just at device probe time.

VIO bus changes

The system will have a particular entitlement level available from which
it can provide memory to the devices.  The bus defines two pools of memory
within this entitlement, the reserved and excess pools.  Each device is
provided with it's own entitlement no less than a system defined minimum
entitlement and no greater than what the device has specified as it's
desired entitlement.  The entitlement provided to devices comes from the
reserve pool.  The reserve pool can also contain a spare allocation as
large as the system defined minimum entitlement which is used for device
hotplug events.  Any entitlement not needed to fulfill the needs of a
reserve pool is placed in the excess pool.  Each device is guaranteed
that it can map up to it's entitled level; additional mapping are possible
as long as there is unmapped memory in the excess pool.

Bus probe

As the system starts, each device is given an entitlement equal only
to the system defined minimum entitlement.  The reserve pool is equal
to the sum of these entitlements, plus a spare allocation.  The VIO bus
also tracks the aggregate desired entitlement of all the devices.  If the
system desired entitlement is greater than the size of the reserve pool,
when devices unmap IO memory it will be reserved and a balance operation
will be scheduled for some time in the future.

Entitlement balancing

The balance function tries to fairly distribute entitlement between the
devices in the system with the goal of providing each device with it's
desired amount of entitlement.  Devices using more than what would be
ideal will have their entitled set-point adjusted; this will effectively
set a goal for lower IO memory usage as future mappings can fail and
deallocations will trigger a balance operation to distribute the newly
unmapped memory.  A fair distribution of entitlement can take several
balance operations to achieve.  Entitlement changes and device DLPAR
events will alter the state of CMO and will trigger balance operations.

Hotplug events

The VIO bus allows for changes in system entitlement at run-time via
'vio_cmo_entitlement_update()'.  When devices are added the hotplug
device event will be preceded by a system entitlement increase and this
is reversed when devices are removed.

The following changes are made that the VIO bus layer for CMO:
 * add IO memory accounting per device structure.
 * add IO memory entitlement query function to driver structure.
 * during vio bus probe, if CMO is enabled, check that driver has
   memory entitlement query function defined.  Fail if function not defined.
 * fail to register driver if io entitlement function not defined.
 * create set of dma_ops at vio level for CMO that will track allocations
   and return DMA failures once entitlement is reached.  Entitlement will
   limited by overall system entitlement.  Devices will have a reserved
   quantity of memory that is guaranteed, the rest can be used as available.
 * expose entitlement, current allocation, desired allocation, and the
   allocation error counter for devices to the user through sysfs
 * provide mechanism for changing a device's desired entitlement at run time
   for devices as an exported function and sysfs tunable
 * track any DMA failures for entitled IO memory for each vio device.
 * check entitlement against available system entitlement on device add
 * track entitlement metrics (high water mark, current usage)
 * provide function to reset high water mark
 * provide minimum and desired entitlement numbers at a bus level
 * provide drivers with a minimum guaranteed entitlement
 * balance available entitlement between devices to satisfy their needs
 * handle system entitlement changes and device hotplug

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-25 15:44:43 +10:00
..
vdso32 powerpc: Fixup lwsync at runtime 2008-07-03 16:58:10 +10:00
vdso64 powerpc: Fixup lwsync at runtime 2008-07-03 16:58:10 +10:00
align.c powerpc: Add VSX load/store alignment exception handler 2008-07-15 12:29:25 +10:00
asm-offsets.c powerpc: Introduce VSX thread_struct and CONFIG_VSX 2008-07-01 11:28:46 +10:00
audit.c
btext.c [POWERPC] Remove duplicate #include 2008-05-09 20:22:58 +10:00
clock.c [POWERPC] clk.h interface for platforms 2007-10-03 09:11:56 +10:00
compat_audit.c
cpu_setup_6xx.S [POWERPC] ppc32: Fix errata for 603 CPUs 2008-04-21 15:00:32 -05:00
cpu_setup_44x.S Revert "[POWERPC] 4xx: Fix 460GT support to not enable FPU" 2008-06-11 07:52:40 -04:00
cpu_setup_pa6t.S
cpu_setup_ppc970.S
cputable.c powerpc: Enable AT_BASE_PLATFORM aux vector 2008-07-25 15:44:39 +10:00
crash_dump.c powerpc: Add PPC_NOP_INSTR, a hash define for the preferred nop instruction 2008-07-01 11:28:23 +10:00
crash.c powerpc: Increase CRASH_HANDLER_MAX 2008-06-30 22:31:00 +10:00
dma_64.c powerpc: move device_to_mask() to dma-mapping.h 2008-07-09 16:30:44 +10:00
entry_32.S powerpc: BookE hardware watchpoint support 2008-07-25 15:44:39 +10:00
entry_64.S Merge commit '85082fd7cbe3173198aac0eb5e85ab1edcc6352c' into test-build 2008-07-15 15:44:51 +10:00
firmware.c
fpu.S powerpc: Add VSX context save/restore, ptrace and signal support 2008-07-01 11:28:50 +10:00
ftrace.c ftrace: store mcount address in rec->ip 2008-06-23 22:10:56 +02:00
head_8xx.S [POWERPC] 8xx: fix swap 2008-03-07 08:42:28 -06:00
head_32.S powerpc: Make load_up_fpu and load_up_altivec callable 2008-07-01 11:28:45 +10:00
head_40x.S [POWERPC] 40x/Book-E: Save/restore volatile exception registers 2008-06-02 14:56:35 -05:00
head_44x.S powerpc: rework 4xx PTE access and TLB miss 2008-07-09 13:36:17 -04:00
head_64.S powerpc: Don't spin on sync instruction at boot time 2008-07-15 12:29:28 +10:00
head_booke.h powerpc: rework 4xx PTE access and TLB miss 2008-07-09 13:36:17 -04:00
head_fsl_booke.S powerpc/fsl: Minor TLBSYNC cleanup for FSL Book-E 2008-07-16 17:57:52 -05:00
ibmebus.c powerpc/ibmebus: more meaningful variable name 2008-07-09 16:30:46 +10:00
idle_6xx.S powerpc/85xx: add DOZE/NAP support for e500 core 2008-06-26 01:48:56 -05:00
idle_e500.S powerpc/e500mc: flush L2 on NAP for e500mc 2008-06-26 01:49:03 -05:00
idle_power4.S
idle.c nohz: prevent tick stop outside of the idle loop 2008-07-18 18:10:28 +02:00
init_task.c [PATCH] take init_files to fs/file.c 2008-05-16 17:22:20 -04:00
io.c ftrace: support for PowerPC 2008-05-23 22:43:11 +02:00
iomap.c [POWERPC] Add 64-bit resources support to pci_iomap 2007-09-20 07:36:52 -05:00
iommu.c powerpc/pseries: iommu enablement for CMO 2008-07-25 15:44:43 +10:00
irq.c Merge commit '85082fd7cbe3173198aac0eb5e85ab1edcc6352c' into test-build 2008-07-15 15:44:51 +10:00
isa-bridge.c [POWERPC] Remove leftover printk in isa-bridge.c 2008-05-09 20:22:59 +10:00
kgdb.c kgdb, powerpc: arch specific powerpc kgdb support 2008-07-23 11:30:15 -05:00
kprobes.c powerpc/booke: Add kprobes support for booke style processors 2008-06-26 03:35:46 -05:00
l2cr_6xx.S Convert files to UTF-8 and some cleanups 2007-10-19 23:21:04 +02:00
legacy_serial.c powerpc: Fix unterminated of_device_id array in legacy_serial.c 2008-07-07 08:53:49 -07:00
lparcfg.c powerpc/pseries: Add CMO paging statistics 2008-07-25 15:44:42 +10:00
machine_kexec_32.c
machine_kexec_64.c Merge commit 'origin/master' 2008-07-16 11:07:59 +10:00
machine_kexec.c [POWERPC] Fix crashkernel= handling when no crashkernel= specified 2008-04-30 19:49:48 +10:00
Makefile kgdb, powerpc: arch specific powerpc kgdb support 2008-07-23 11:30:15 -05:00
misc_32.S powerpc/kprobes: Some minor fixes 2008-06-26 03:35:33 -05:00
misc_64.S powerpc: fix giveup_vsx to save registers correctly 2008-07-15 12:29:23 +10:00
misc.S powerpc: Add cputable entry for POWER7 2008-06-30 22:31:11 +10:00
module_32.c powerpc: Move common module code into its own file 2008-07-01 11:28:05 +10:00
module_64.c powerpc: Add PPC_NOP_INSTR, a hash define for the preferred nop instruction 2008-07-01 11:28:23 +10:00
module.c powerpc: Fixup lwsync at runtime 2008-07-03 16:58:10 +10:00
msi.c [POWERPC] Fix sparse warnings in arch/powerpc/kernel 2008-05-14 22:31:59 +10:00
nvram_64.c [POWERPC] pseries: Eliminate global error_log_cnt variable 2007-08-17 11:01:52 +10:00
of_device.c [POWERPC] Move of_device_get_modalias to drivers/of 2008-05-16 23:22:28 +10:00
of_platform.c powerpc: Add missing reference to coherent_dma_mask 2008-07-08 21:06:35 -07:00
paca.c [POWERPC] Raise the upper limit of NR_CPUS and move the pacas into the BSS 2008-04-24 20:58:04 +10:00
pci_32.c [POWERPC] Remove update_bridge_resource 2008-01-23 19:32:30 -06:00
pci_64.c [POWERPC] Use dev_set_name in pci_64.c 2008-06-09 11:32:40 +10:00
pci_dn.c [POWERPC] iSeries: eliminate pci_dn bussubno 2008-01-17 14:57:05 +11:00
pci-common.c powerpc: Fix OF parsing of 64 bits PCI addresses 2008-07-22 10:39:34 +10:00
pmc.c [POWERPC] Made FSL Book-E PMC support more generic 2008-02-05 23:34:14 -06:00
ppc32.h powerpc: Add VSX context save/restore, ptrace and signal support 2008-07-01 11:28:50 +10:00
ppc_ksyms.c Merge commit '85082fd7cbe3173198aac0eb5e85ab1edcc6352c' into test-build 2008-07-15 15:44:51 +10:00
proc_ppc64.c powerpc: use non-racy method for proc entries creation 2008-04-29 08:06:22 -07:00
process.c powerpc: BookE hardware watchpoint support 2008-07-25 15:44:39 +10:00
prom_init_check.sh [POWERPC] Fix -Os kernel builds with newer gcc versions 2008-06-16 15:00:54 +10:00
prom_init.c powerpc: Tell firmware we support architecture V2.06 2008-07-01 11:28:00 +10:00
prom_parse.c powerpc: Fix OF parsing of 64 bits PCI addresses 2008-07-22 10:39:34 +10:00
prom.c powerpc: Add VSX CPU feature 2008-07-01 11:28:47 +10:00
ptrace32.c powerpc: Add macros to access floating point registers in thread_struct. 2008-07-01 11:28:43 +10:00
ptrace.c powerpc: BookE hardware watchpoint support 2008-07-25 15:44:39 +10:00
rtas_flash.c [POWERPC] Fix sparse warnings in arch/powerpc/kernel 2008-05-14 22:31:59 +10:00
rtas_pci.c [POWERPC] Fix sparse warnings in arch/powerpc/kernel 2008-05-14 22:31:59 +10:00
rtas-proc.c [POWERPC] Fix sparse warnings in arch/powerpc/kernel 2008-05-14 22:31:59 +10:00
rtas-rtc.c
rtas.c Merge commit 'origin/master' 2008-07-16 11:07:59 +10:00
setup_32.c kgdb, powerpc: arch specific powerpc kgdb support 2008-07-23 11:30:15 -05:00
setup_64.c powerpc: Fixup lwsync at runtime 2008-07-03 16:58:10 +10:00
setup-common.c powerpc: Add the PC speaker only when requested 2008-06-09 13:42:30 +10:00
setup.h
signal_32.c powerpc: fix giveup_vsx to save registers correctly 2008-07-15 12:29:23 +10:00
signal_64.c powerpc: fix giveup_vsx to save registers correctly 2008-07-15 12:29:23 +10:00
signal.c powerpc: BookE hardware watchpoint support 2008-07-25 15:44:39 +10:00
signal.h powerpc: Clean up copy_to/from_user for vsx and fpr 2008-07-03 16:58:11 +10:00
smp-tbsync.c
smp.c Merge commit 'origin/master' 2008-07-16 11:07:59 +10:00
softemu8xx.c powerpc: Add macros to access floating point registers in thread_struct. 2008-07-01 11:28:43 +10:00
stacktrace.c powerpc: Fix support for latencytop 2008-07-22 10:39:33 +10:00
suspend.c PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures 2008-07-24 10:47:21 -07:00
swsusp_32.S [POWERPC] Make altivec code in swsusp_32.S depend on CONFIG_ALTIVEC 2007-11-08 14:15:34 +11:00
swsusp_64.c
swsusp_asm64.S
swsusp.c powerpc: fixup hard_irq_disable semantics 2007-05-11 08:29:34 -07:00
sys_ppc32.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc 2008-04-21 15:50:49 -07:00
syscalls.c powerpc/mm: Add Strong Access Ordering support 2008-07-09 16:30:45 +10:00
sysfs.c powerpc: Fallout from sysdev API changes 2008-07-25 15:44:39 +10:00
systbl_chk.c [POWERPC] Fix a couple of copyright symbols 2008-01-25 22:52:50 +11:00
systbl_chk.sh [POWERPC] Fix a couple of copyright symbols 2008-01-25 22:52:50 +11:00
systbl.S [POWERPC] Align the sys_call_table 2007-10-11 14:36:47 +10:00
tau_6xx.c on_each_cpu(): kill unused 'retry' parameter 2008-06-26 11:24:38 +02:00
time.c Merge commit 'origin/master' 2008-07-16 11:07:59 +10:00
traps.c powerpc: BookE hardware watchpoint support 2008-07-25 15:44:39 +10:00
udbg_16550.c [POWERPC] 4xx: Add early udbg support for 40x processors 2007-12-23 13:13:03 -06:00
udbg.c [POWERPC] Mark udbg console as CON_ANYTIME, ie. callable early in boot 2008-04-24 21:08:11 +10:00
vdso.c powerpc: Fixup lwsync at runtime 2008-07-03 16:58:10 +10:00
vecemu.c
vector.S
vio.c powerpc/pseries: vio bus support for CMO 2008-07-25 15:44:43 +10:00
vmlinux.lds.S powerpc: Fix compile error with binutils 2.15 2008-07-25 15:44:40 +10:00