linux_dsm_epyc7002/arch/s390/Kconfig

879 lines
23 KiB
Plaintext
Raw Normal View History

config MMU
def_bool y
config ZONE_DMA
def_bool y
config CPU_BIG_ENDIAN
def_bool y
config LOCKDEP_SUPPORT
def_bool y
config STACKTRACE_SUPPORT
def_bool y
config RWSEM_GENERIC_SPINLOCK
bool
config RWSEM_XCHGADD_ALGORITHM
def_bool y
config ARCH_HAS_ILOG2_U32
def_bool n
config ARCH_HAS_ILOG2_U64
def_bool n
config GENERIC_HWEIGHT
def_bool y
config GENERIC_BUG
def_bool y if BUG
config GENERIC_BUG_RELATIVE_POINTERS
def_bool y
config ARCH_DMA_ADDR_T_64BIT
def_bool y
config GENERIC_LOCKBREAK
def_bool y if SMP && PREEMPT
config PGSTE
def_bool y if KVM
config ARCH_SUPPORTS_DEBUG_PAGEALLOC
def_bool y
config KEXEC
def_bool y
2015-09-10 05:38:55 +07:00
select KEXEC_CORE
config AUDIT_ARCH
def_bool y
config NO_IOPORT_MAP
def_bool y
config PCI_QUIRKS
def_bool n
config ARCH_SUPPORTS_UPROBES
def_bool y
config DEBUG_RODATA
def_bool y
config S390
def_bool y
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
select ARCH_HAS_DEVMEM_IS_ALLOWED
mm: expose arch_mmap_rnd when available When an architecture fully supports randomizing the ELF load location, a per-arch mmap_rnd() function is used to find a randomized mmap base. In preparation for randomizing the location of ET_DYN binaries separately from mmap, this renames and exports these functions as arch_mmap_rnd(). Additionally introduces CONFIG_ARCH_HAS_ELF_RANDOMIZE for describing this feature on architectures that support it (which is a superset of ARCH_BINFMT_ELF_RANDOMIZE_PIE, since s390 already supports a separated ET_DYN ASLR from mmap ASLR without the ARCH_BINFMT_ELF_RANDOMIZE_PIE logic). Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Russell King <linux@arm.linux.org.uk> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: "David A. Long" <dave.long@linaro.org> Cc: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Arun Chandran <achandran@mvista.com> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: Min-Hua Chen <orca.chen@gmail.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Alex Smith <alex@alex-smith.me.uk> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Vineeth Vijayan <vvijayan@mvista.com> Cc: Jeff Bailey <jeffbailey@google.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Behan Webster <behanw@converseincode.com> Cc: Ismael Ripoll <iripoll@upv.es> Cc: Jan-Simon Mller <dl9pf@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15 05:48:00 +07:00
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_SG_CHAIN
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_INLINE_READ_LOCK
select ARCH_INLINE_READ_LOCK_BH
select ARCH_INLINE_READ_LOCK_IRQ
select ARCH_INLINE_READ_LOCK_IRQSAVE
select ARCH_INLINE_READ_TRYLOCK
select ARCH_INLINE_READ_UNLOCK
select ARCH_INLINE_READ_UNLOCK_BH
select ARCH_INLINE_READ_UNLOCK_IRQ
select ARCH_INLINE_READ_UNLOCK_IRQRESTORE
select ARCH_INLINE_SPIN_LOCK
select ARCH_INLINE_SPIN_LOCK_BH
select ARCH_INLINE_SPIN_LOCK_IRQ
select ARCH_INLINE_SPIN_LOCK_IRQSAVE
select ARCH_INLINE_SPIN_TRYLOCK
select ARCH_INLINE_SPIN_TRYLOCK_BH
select ARCH_INLINE_SPIN_UNLOCK
select ARCH_INLINE_SPIN_UNLOCK_BH
select ARCH_INLINE_SPIN_UNLOCK_IRQ
select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE
select ARCH_INLINE_WRITE_LOCK
select ARCH_INLINE_WRITE_LOCK_BH
select ARCH_INLINE_WRITE_LOCK_IRQ
select ARCH_INLINE_WRITE_LOCK_IRQSAVE
select ARCH_INLINE_WRITE_TRYLOCK
select ARCH_INLINE_WRITE_UNLOCK
select ARCH_INLINE_WRITE_UNLOCK_BH
select ARCH_INLINE_WRITE_UNLOCK_IRQ
select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE
select ARCH_SAVE_PAGE_KEYS if HIBERNATION
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_NUMA_BALANCING
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_WANTS_DYNAMIC_TASK_STRUCT
select ARCH_WANTS_PROT_NUMA_PROT_NONE
select ARCH_WANT_IPC_PARSE_VERSION
select BUILDTIME_EXTABLE_SORT
select CLONE_BACKWARDS2
select DYNAMIC_FTRACE if FUNCTION_TRACER
select GENERIC_CLOCKEVENTS
select GENERIC_CPU_AUTOPROBE
select GENERIC_CPU_DEVICES if !SMP
select GENERIC_FIND_FIRST_BIT
select GENERIC_SMP_IDLE_THREAD
select GENERIC_TIME_VSYSCALL
select HAVE_ALIGNED_STRUCT_PAGE if SLUB
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_EARLY_PFN_TO_NID
select HAVE_ARCH_JUMP_LABEL
lib/GCD.c: use binary GCD algorithm instead of Euclidean The binary GCD algorithm is based on the following facts: 1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2) 2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b) 3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b) Even on x86 machines with reasonable division hardware, the binary algorithm runs about 25% faster (80% the execution time) than the division-based Euclidian algorithm. On platforms like Alpha and ARMv6 where division is a function call to emulation code, it's even more significant. There are two variants of the code here, depending on whether a fast __ffs (find least significant set bit) instruction is available. This allows the unpredictable branches in the bit-at-a-time shifting loop to be eliminated. If fast __ffs is not available, the "even/odd" GCD variant is used. I use the following code to benchmark: #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <string.h> #include <time.h> #include <unistd.h> #define swap(a, b) \ do { \ a ^= b; \ b ^= a; \ a ^= b; \ } while (0) unsigned long gcd0(unsigned long a, unsigned long b) { unsigned long r; if (a < b) { swap(a, b); } if (b == 0) return a; while ((r = a % b) != 0) { a = b; b = r; } return b; } unsigned long gcd1(unsigned long a, unsigned long b) { unsigned long r = a | b; if (!a || !b) return r; b >>= __builtin_ctzl(b); for (;;) { a >>= __builtin_ctzl(a); if (a == b) return a << __builtin_ctzl(r); if (a < b) swap(a, b); a -= b; } } unsigned long gcd2(unsigned long a, unsigned long b) { unsigned long r = a | b; if (!a || !b) return r; r &= -r; while (!(b & r)) b >>= 1; for (;;) { while (!(a & r)) a >>= 1; if (a == b) return a; if (a < b) swap(a, b); a -= b; a >>= 1; if (a & r) a += b; a >>= 1; } } unsigned long gcd3(unsigned long a, unsigned long b) { unsigned long r = a | b; if (!a || !b) return r; b >>= __builtin_ctzl(b); if (b == 1) return r & -r; for (;;) { a >>= __builtin_ctzl(a); if (a == 1) return r & -r; if (a == b) return a << __builtin_ctzl(r); if (a < b) swap(a, b); a -= b; } } unsigned long gcd4(unsigned long a, unsigned long b) { unsigned long r = a | b; if (!a || !b) return r; r &= -r; while (!(b & r)) b >>= 1; if (b == r) return r; for (;;) { while (!(a & r)) a >>= 1; if (a == r) return r; if (a == b) return a; if (a < b) swap(a, b); a -= b; a >>= 1; if (a & r) a += b; a >>= 1; } } static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = { gcd0, gcd1, gcd2, gcd3, gcd4, }; #define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0])) #if defined(__x86_64__) #define rdtscll(val) do { \ unsigned long __a,__d; \ __asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \ (val) = ((unsigned long long)__a) | (((unsigned long long)__d)<<32); \ } while(0) static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long), unsigned long a, unsigned long b, unsigned long *res) { unsigned long long start, end; unsigned long long ret; unsigned long gcd_res; rdtscll(start); gcd_res = gcd(a, b); rdtscll(end); if (end >= start) ret = end - start; else ret = ~0ULL - start + 1 + end; *res = gcd_res; return ret; } #else static inline struct timespec read_time(void) { struct timespec time; clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time); return time; } static inline unsigned long long diff_time(struct timespec start, struct timespec end) { struct timespec temp; if ((end.tv_nsec - start.tv_nsec) < 0) { temp.tv_sec = end.tv_sec - start.tv_sec - 1; temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec; } else { temp.tv_sec = end.tv_sec - start.tv_sec; temp.tv_nsec = end.tv_nsec - start.tv_nsec; } return temp.tv_sec * 1000000000ULL + temp.tv_nsec; } static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long), unsigned long a, unsigned long b, unsigned long *res) { struct timespec start, end; unsigned long gcd_res; start = read_time(); gcd_res = gcd(a, b); end = read_time(); *res = gcd_res; return diff_time(start, end); } #endif static inline unsigned long get_rand() { if (sizeof(long) == 8) return (unsigned long)rand() << 32 | rand(); else return rand(); } int main(int argc, char **argv) { unsigned int seed = time(0); int loops = 100; int repeats = 1000; unsigned long (*res)[TEST_ENTRIES]; unsigned long long elapsed[TEST_ENTRIES]; int i, j, k; for (;;) { int opt = getopt(argc, argv, "n:r:s:"); /* End condition always first */ if (opt == -1) break; switch (opt) { case 'n': loops = atoi(optarg); break; case 'r': repeats = atoi(optarg); break; case 's': seed = strtoul(optarg, NULL, 10); break; default: /* You won't actually get here. */ break; } } res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops); memset(elapsed, 0, sizeof(elapsed)); srand(seed); for (j = 0; j < loops; j++) { unsigned long a = get_rand(); /* Do we have args? */ unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand(); unsigned long long min_elapsed[TEST_ENTRIES]; for (k = 0; k < repeats; k++) { for (i = 0; i < TEST_ENTRIES; i++) { unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &res[j][i]); if (k == 0 || min_elapsed[i] > tmp) min_elapsed[i] = tmp; } } for (i = 0; i < TEST_ENTRIES; i++) elapsed[i] += min_elapsed[i]; } for (i = 0; i < TEST_ENTRIES; i++) printf("gcd%d: elapsed %llu\n", i, elapsed[i]); k = 0; srand(seed); for (j = 0; j < loops; j++) { unsigned long a = get_rand(); unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand(); for (i = 1; i < TEST_ENTRIES; i++) { if (res[j][i] != res[j][0]) break; } if (i < TEST_ENTRIES) { if (k == 0) { k = 1; fprintf(stderr, "Error:\n"); } fprintf(stderr, "gcd(%lu, %lu): ", a, b); for (i = 0; i < TEST_ENTRIES; i++) fprintf(stderr, "%ld%s", res[j][i], i < TEST_ENTRIES - 1 ? ", " : "\n"); } } if (k == 0) fprintf(stderr, "PASS\n"); free(res); return 0; } Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got: zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10 gcd0: elapsed 10174 gcd1: elapsed 2120 gcd2: elapsed 2902 gcd3: elapsed 2039 gcd4: elapsed 2812 PASS zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10 gcd0: elapsed 9309 gcd1: elapsed 2280 gcd2: elapsed 2822 gcd3: elapsed 2217 gcd4: elapsed 2710 PASS zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10 gcd0: elapsed 9589 gcd1: elapsed 2098 gcd2: elapsed 2815 gcd3: elapsed 2030 gcd4: elapsed 2718 PASS zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10 gcd0: elapsed 9914 gcd1: elapsed 2309 gcd2: elapsed 2779 gcd3: elapsed 2228 gcd4: elapsed 2709 PASS [akpm@linux-foundation.org: avoid #defining a CONFIG_ variable] Signed-off-by: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com> Signed-off-by: George Spelvin <linux@horizon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-21 07:03:57 +07:00
select CPU_NO_EFFICIENT_FFS if !HAVE_MARCH_Z9_109_FEATURES
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_SOFT_DIRTY
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select HAVE_EBPF_JIT if PACK_STACK && HAVE_MARCH_Z196_FEATURES
select HAVE_CMPXCHG_DOUBLE
select HAVE_CMPXCHG_LOCAL
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_API_DEBUG
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_REGS
exit_thread: remove empty bodies Define HAVE_EXIT_THREAD for archs which want to do something in exit_thread. For others, let's define exit_thread as an empty inline. This is a cleanup before we change the prototype of exit_thread to accept a task parameter. [akpm@linux-foundation.org: fix mips] Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: "David S. Miller" <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Liqin <liqin.linux@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Howells <dhowells@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonas Bonn <jonas@southpole.se> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mikael Starvik <starvik@axis.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@arm.linux.org.uk> Cc: Steven Miao <realmz6@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-21 07:00:16 +07:00
select HAVE_EXIT_THREAD
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_TRACER
select HAVE_FUTEX_CMPXCHG if FUTEX
select HAVE_KERNEL_BZIP2
select HAVE_KERNEL_GZIP
select HAVE_KERNEL_LZ4
select HAVE_KERNEL_LZMA
select HAVE_KERNEL_LZO
select HAVE_KERNEL_XZ
select HAVE_KPROBES
select HAVE_KRETPROBES
select HAVE_KVM
select HAVE_LIVEPATCH
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP
select HAVE_MEMBLOCK_PHYS_MAP
2012-09-28 12:01:03 +07:00
select HAVE_MOD_ARCH_SPECIFIC
select HAVE_OPROFILE
select HAVE_PERF_EVENTS
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING
2012-09-28 12:01:03 +07:00
select MODULES_USE_ELF_RELA
select NO_BOOTMEM
select OLD_SIGACTION
select OLD_SIGSUSPEND3
select SYSCTL_EXCEPTION_TRACE
select TTY
select VIRT_CPU_ACCOUNTING
select VIRT_TO_BUS
printk/nmi: generic solution for safe printk in NMI printk() takes some locks and could not be used a safe way in NMI context. The chance of a deadlock is real especially when printing stacks from all CPUs. This particular problem has been addressed on x86 by the commit a9edc8809328 ("x86/nmi: Perform a safe NMI stack trace on all CPUs"). The patchset brings two big advantages. First, it makes the NMI backtraces safe on all architectures for free. Second, it makes all NMI messages almost safe on all architectures (the temporary buffer is limited. We still should keep the number of messages in NMI context at minimum). Note that there already are several messages printed in NMI context: WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE handlers. These are not easy to avoid. This patch reuses most of the code and makes it generic. It is useful for all messages and architectures that support NMI. The alternative printk_func is set when entering and is reseted when leaving NMI context. It queues IRQ work to copy the messages into the main ring buffer in a safe context. __printk_nmi_flush() copies all available messages and reset the buffer. Then we could use a simple cmpxchg operations to get synchronized with writers. There is also used a spinlock to get synchronized with other flushers. We do not longer use seq_buf because it depends on external lock. It would be hard to make all supported operations safe for a lockless use. It would be confusing and error prone to make only some operations safe. The code is put into separate printk/nmi.c as suggested by Steven Rostedt. It needs a per-CPU buffer and is compiled only on architectures that call nmi_enter(). This is achieved by the new HAVE_NMI Kconfig flag. The are MN10300 and Xtensa architectures. We need to clean up NMI handling there first. Let's do it separately. The patch is heavily based on the draft from Peter Zijlstra, see https://lkml.org/lkml/2015/6/10/327 [arnd@arndb.de: printk-nmi: use %zu format string for size_t] [akpm@linux-foundation.org: min_t->min - all types are size_t here] Signed-off-by: Petr Mladek <pmladek@suse.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Cc: Jan Kara <jack@suse.cz> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> [arm part] Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jiri Kosina <jkosina@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: David Miller <davem@davemloft.net> Cc: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-21 07:00:33 +07:00
select HAVE_NMI
config SCHED_OMIT_FRAME_POINTER
def_bool y
config PGTABLE_LEVELS
int
default 4
source "init/Kconfig"
container freezer: implement freezer cgroup subsystem This patch implements a new freezer subsystem in the control groups framework. It provides a way to stop and resume execution of all tasks in a cgroup by writing in the cgroup filesystem. The freezer subsystem in the container filesystem defines a file named freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in the cgroup. Reading will return the current state. * Examples of usage : # mkdir /containers/freezer # mount -t cgroup -ofreezer freezer /containers # mkdir /containers/0 # echo $some_pid > /containers/0/tasks to get status of the freezer subsystem : # cat /containers/0/freezer.state RUNNING to freeze all tasks in the container : # echo FROZEN > /containers/0/freezer.state # cat /containers/0/freezer.state FREEZING # cat /containers/0/freezer.state FROZEN to unfreeze all tasks in the container : # echo RUNNING > /containers/0/freezer.state # cat /containers/0/freezer.state RUNNING This is the basic mechanism which should do the right thing for user space task in a simple scenario. It's important to note that freezing can be incomplete. In that case we return EBUSY. This means that some tasks in the cgroup are busy doing something that prevents us from completely freezing the cgroup at this time. After EBUSY, the cgroup will remain partially frozen -- reflected by freezer.state reporting "FREEZING" when read. The state will remain "FREEZING" until one of these things happens: 1) Userspace cancels the freezing operation by writing "RUNNING" to the freezer.state file 2) Userspace retries the freezing operation by writing "FROZEN" to the freezer.state file (writing "FREEZING" is not legal and returns EIO) 3) The tasks that blocked the cgroup from entering the "FROZEN" state disappear from the cgroup's set of tasks. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: export thaw_process] Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Tested-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-19 10:27:21 +07:00
source "kernel/Kconfig.freezer"
source "kernel/livepatch/Kconfig"
menu "Processor type and features"
config HAVE_MARCH_Z900_FEATURES
def_bool n
config HAVE_MARCH_Z990_FEATURES
def_bool n
select HAVE_MARCH_Z900_FEATURES
config HAVE_MARCH_Z9_109_FEATURES
def_bool n
select HAVE_MARCH_Z990_FEATURES
config HAVE_MARCH_Z10_FEATURES
def_bool n
select HAVE_MARCH_Z9_109_FEATURES
config HAVE_MARCH_Z196_FEATURES
def_bool n
select HAVE_MARCH_Z10_FEATURES
config HAVE_MARCH_ZEC12_FEATURES
def_bool n
select HAVE_MARCH_Z196_FEATURES
config HAVE_MARCH_Z13_FEATURES
def_bool n
select HAVE_MARCH_ZEC12_FEATURES
choice
prompt "Processor type"
default MARCH_Z196
config MARCH_Z900
bool "IBM zSeries model z800 and z900"
select HAVE_MARCH_Z900_FEATURES
help
Select this to enable optimizations for model z800/z900 (2064 and
2066 series). This will enable some optimizations that are not
available on older ESA/390 (31 Bit) only CPUs.
config MARCH_Z990
bool "IBM zSeries model z890 and z990"
select HAVE_MARCH_Z990_FEATURES
help
Select this to enable optimizations for model z890/z990 (2084 and
2086 series). The kernel will be slightly faster but will not work
on older machines.
config MARCH_Z9_109
bool "IBM System z9"
select HAVE_MARCH_Z9_109_FEATURES
help
Select this to enable optimizations for IBM System z9 (2094 and
2096 series). The kernel will be slightly faster but will not work
on older machines.
config MARCH_Z10
bool "IBM System z10"
select HAVE_MARCH_Z10_FEATURES
help
Select this to enable optimizations for IBM System z10 (2097 and
2098 series). The kernel will be slightly faster but will not work
on older machines.
config MARCH_Z196
bool "IBM zEnterprise 114 and 196"
select HAVE_MARCH_Z196_FEATURES
help
Select this to enable optimizations for IBM zEnterprise 114 and 196
(2818 and 2817 series). The kernel will be slightly faster but will
not work on older machines.
config MARCH_ZEC12
bool "IBM zBC12 and zEC12"
select HAVE_MARCH_ZEC12_FEATURES
help
Select this to enable optimizations for IBM zBC12 and zEC12 (2828 and
2827 series). The kernel will be slightly faster but will not work on
older machines.
config MARCH_Z13
bool "IBM z13s and z13"
select HAVE_MARCH_Z13_FEATURES
help
Select this to enable optimizations for IBM z13s and z13 (2965 and
2964 series). The kernel will be slightly faster but will not work on
older machines.
endchoice
config MARCH_Z900_TUNE
def_bool TUNE_Z900 || MARCH_Z900 && TUNE_DEFAULT
config MARCH_Z990_TUNE
def_bool TUNE_Z990 || MARCH_Z990 && TUNE_DEFAULT
config MARCH_Z9_109_TUNE
def_bool TUNE_Z9_109 || MARCH_Z9_109 && TUNE_DEFAULT
config MARCH_Z10_TUNE
def_bool TUNE_Z10 || MARCH_Z10 && TUNE_DEFAULT
config MARCH_Z196_TUNE
def_bool TUNE_Z196 || MARCH_Z196 && TUNE_DEFAULT
config MARCH_ZEC12_TUNE
def_bool TUNE_ZEC12 || MARCH_ZEC12 && TUNE_DEFAULT
config MARCH_Z13_TUNE
def_bool TUNE_Z13 || MARCH_Z13 && TUNE_DEFAULT
choice
prompt "Tune code generation"
default TUNE_DEFAULT
help
Cause the compiler to tune (-mtune) the generated code for a machine.
This will make the code run faster on the selected machine but
somewhat slower on other machines.
This option only changes how the compiler emits instructions, not the
selection of instructions itself, so the resulting kernel will run on
all other machines.
config TUNE_DEFAULT
bool "Default"
help
Tune the generated code for the target processor for which the kernel
will be compiled.
config TUNE_Z900
bool "IBM zSeries model z800 and z900"
config TUNE_Z990
bool "IBM zSeries model z890 and z990"
config TUNE_Z9_109
bool "IBM System z9"
config TUNE_Z10
bool "IBM System z10"
config TUNE_Z196
bool "IBM zEnterprise 114 and 196"
config TUNE_ZEC12
bool "IBM zBC12 and zEC12"
config TUNE_Z13
bool "IBM z13"
endchoice
config 64BIT
def_bool y
config COMPAT
def_bool y
prompt "Kernel support for 31 bit emulation"
select COMPAT_BINFMT_ELF if BINFMT_ELF
select ARCH_WANT_OLD_COMPAT_IPC
select COMPAT_OLD_SIGACTION
kernel: conditionally support non-root users, groups and capabilities There are a lot of embedded systems that run most or all of their functionality in init, running as root:root. For these systems, supporting multiple users is not necessary. This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for non-root users, non-root groups, and capabilities optional. It is enabled under CONFIG_EXPERT menu. When this symbol is not defined, UID and GID are zero in any possible case and processes always have all capabilities. The following syscalls are compiled out: setuid, setregid, setgid, setreuid, setresuid, getresuid, setresgid, getresgid, setgroups, getgroups, setfsuid, setfsgid, capget, capset. Also, groups.c is compiled out completely. In kernel/capability.c, capable function was moved in order to avoid adding two ifdef blocks. This change saves about 25 KB on a defconfig build. The most minimal kernels have total text sizes in the high hundreds of kB rather than low MB. (The 25k goes down a bit with allnoconfig, but not that much. The kernel was booted in Qemu. All the common functionalities work. Adding users/groups is not possible, failing with -ENOSYS. Bloat-o-meter output: add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650) [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Iulia Manda <iulia.manda21@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-16 06:16:41 +07:00
depends on MULTIUSER
help
Select this option if you want to enable your system kernel to
handle system-calls from ELF binaries for 31 bit ESA. This option
(and some other stuff like libraries and such) is needed for
executing 31 bit applications. It is safe to say "Y".
config SYSVIPC_COMPAT
def_bool y if COMPAT && SYSVIPC
config KEYS_COMPAT
def_bool y if COMPAT && KEYS
config SMP
def_bool y
prompt "Symmetric multi-processing support"
---help---
This enables support for systems with more than one CPU. If you have
a system with only one CPU, like most personal computers, say N. If
you have a system with more than one CPU, say Y.
If you say N here, the kernel will run on uni- and multiprocessor
machines, but will use only one CPU of a multiprocessor machine. If
you say Y here, the kernel will run on many, but not all,
uniprocessor machines. On a uniprocessor machine, the kernel
will run faster if you say N here.
See also the SMP-HOWTO available at
<http://www.tldp.org/docs.html#howto>.
Even if you don't know what to do here, say Y.
config NR_CPUS
int "Maximum number of CPUs (2-512)"
range 2 512
depends on SMP
default "64"
help
This allows you to specify the maximum number of CPUs which this
kernel will support. The maximum supported value is 512 and the
minimum value which makes sense is 2.
This is purely to save memory - each supported CPU adds
approximately sixteen kilobytes to the kernel image.
config HOTPLUG_CPU
def_bool y
prompt "Support for hot-pluggable CPUs"
depends on SMP
help
Say Y here to be able to turn CPUs off and on. CPUs
can be controlled through /sys/devices/system/cpu/cpu#.
Say N if you want to disable CPU hotplug.
# Some NUMA nodes have memory ranges that span
# other nodes. Even though a pfn is valid and
# between a node's start and end pfns, it may not
# reside on that node. See memmap_init_zone()
# for details. <- They meant memory holes!
config NODES_SPAN_OTHER_NODES
def_bool NUMA
config NUMA
bool "NUMA support"
depends on SMP && SCHED_TOPOLOGY
default n
help
Enable NUMA support
This option adds NUMA support to the kernel.
An operation mode can be selected by appending
numa=<method> to the kernel command line.
The default behaviour is identical to appending numa=plain to
the command line. This will create just one node with all
available memory and all CPUs in it.
config NODES_SHIFT
int "Maximum NUMA nodes (as a power of 2)"
range 1 10
depends on NUMA
default "4"
help
Specify the maximum number of NUMA nodes available on the target
system. Increases memory reserved to accommodate various tables.
menu "Select NUMA modes"
depends on NUMA
config NUMA_EMU
bool "NUMA emulation"
default y
help
Numa emulation mode will split the available system memory into
equal chunks which then are distributed over the configured number
of nodes in a round-robin manner.
The number of fake nodes is limited by the number of available memory
chunks (i.e. memory size / fake size) and the number of supported
nodes in the kernel.
The CPUs are assigned to the nodes in a way that partially respects
the original machine topology (if supported by the machine).
Fair distribution of the CPUs is not guaranteed.
config EMU_SIZE
hex "NUMA emulation memory chunk size"
default 0x10000000
range 0x400000 0x100000000
depends on NUMA_EMU
help
Select the default size by which the memory is chopped and then
assigned to emulated NUMA nodes.
This can be overridden by specifying
emu_size=<n>
on the kernel command line where also suffixes K, M, G, and T are
supported.
endmenu
config SCHED_SMT
def_bool n
config SCHED_MC
def_bool n
config SCHED_BOOK
def_bool n
config SCHED_TOPOLOGY
def_bool y
prompt "Topology scheduler support"
depends on SMP
select SCHED_SMT
select SCHED_MC
select SCHED_BOOK
help
Topology scheduler support improves the CPU scheduler's decision
making when dealing with machines that have multi-threading,
multiple cores or multiple books.
source kernel/Kconfig.preempt
source kernel/Kconfig.hz
endmenu
menu "Memory setup"
config ARCH_SPARSEMEM_ENABLE
def_bool y
select SPARSEMEM_VMEMMAP_ENABLE
select SPARSEMEM_VMEMMAP
config ARCH_SPARSEMEM_DEFAULT
def_bool y
config ARCH_SELECT_MEMORY_MODEL
def_bool y
config ARCH_ENABLE_MEMORY_HOTPLUG
def_bool y if SPARSEMEM
config ARCH_ENABLE_MEMORY_HOTREMOVE
def_bool y
config ARCH_ENABLE_SPLIT_PMD_PTLOCK
def_bool y
config FORCE_MAX_ZONEORDER
int
default "9"
source "mm/Kconfig"
config PACK_STACK
def_bool y
prompt "Pack kernel stack"
help
This option enables the compiler option -mkernel-backchain if it
is available. If the option is available the compiler supports
the new stack layout which dramatically reduces the minimum stack
frame size. With an old compiler a non-leaf function needs a
minimum of 96 bytes on 31 bit and 160 bytes on 64 bit. With
-mkernel-backchain the minimum size drops to 16 byte on 31 bit
and 24 byte on 64 bit.
Say Y if you are unsure.
config CHECK_STACK
def_bool y
prompt "Detect kernel stack overflow"
help
This option enables the compiler option -mstack-guard and
-mstack-size if they are available. If the compiler supports them
it will emit additional code to each function prolog to trigger
an illegal operation if the kernel stack is about to overflow.
Say N if you are unsure.
config STACK_GUARD
int "Size of the guard area (128-1024)"
range 128 1024
depends on CHECK_STACK
default "256"
help
This allows you to specify the size of the guard area at the lower
end of the kernel stack. If the kernel stack points into the guard
area on function entry an illegal operation is triggered. The size
needs to be a power of 2. Please keep in mind that the size of an
interrupt frame is 184 bytes for 31 bit and 328 bytes on 64 bit.
The minimum size for the stack guard should be 256 for 31 bit and
512 for 64 bit.
config WARN_DYNAMIC_STACK
def_bool n
prompt "Emit compiler warnings for function with dynamic stack usage"
help
This option enables the compiler option -mwarn-dynamicstack. If the
compiler supports this options generates warnings for functions
that dynamically allocate stack space using alloca.
Say N if you are unsure.
endmenu
menu "I/O subsystem"
config QDIO
def_tristate y
prompt "QDIO support"
---help---
This driver provides the Queued Direct I/O base support for
IBM System z.
To compile this driver as a module, choose M here: the
module will be called qdio.
If unsure, say Y.
menuconfig PCI
bool "PCI support"
select PCI_MSI
select IOMMU_SUPPORT
help
Enable PCI support.
if PCI
config PCI_NR_FUNCTIONS
int "Maximum number of PCI functions (1-4096)"
range 1 4096
default "64"
help
This allows you to specify the maximum number of PCI functions which
this kernel will support.
config PCI_NR_MSI
int "Maximum number of MSI interrupts (64-32768)"
range 64 32768
default "256"
help
This defines the number of virtual interrupts the kernel will
provide for MSI interrupts. If you configure your system to have
too few drivers will fail to allocate MSI interrupts for all
PCI devices.
source "drivers/pci/Kconfig"
endif # PCI
config PCI_DOMAINS
def_bool PCI
config HAS_IOMEM
def_bool PCI
config IOMMU_HELPER
def_bool PCI
config NEED_SG_DMA_LENGTH
def_bool PCI
config NEED_DMA_MAP_STATE
def_bool PCI
config CHSC_SCH
def_tristate m
prompt "Support for CHSC subchannels"
help
This driver allows usage of CHSC subchannels. A CHSC subchannel
is usually present on LPAR only.
The driver creates a device /dev/chsc, which may be used to
obtain I/O configuration information about the machine and
to issue asynchronous chsc commands (DANGEROUS).
You will usually only want to use this interface on a special
LPAR designated for system management.
To compile this driver as a module, choose M here: the
module will be called chsc_sch.
If unsure, say N.
config SCM_BUS
def_bool y
prompt "SCM bus driver"
help
Bus driver for Storage Class Memory.
config EADM_SCH
def_tristate m
prompt "Support for EADM subchannels"
depends on SCM_BUS
help
This driver allows usage of EADM subchannels. EADM subchannels act
as a communication vehicle for SCM increments.
To compile this driver as a module, choose M here: the
module will be called eadm_sch.
endmenu
menu "Dump support"
config CRASH_DUMP
bool "kernel crash dumps"
depends on SMP
select KEXEC
help
Generate crash dump after being started by kexec.
Crash dump kernels are loaded in the main kernel with kexec-tools
into a specially reserved region and then later executed after
a crash by kdump/kexec.
Refer to <file:Documentation/s390/zfcpdump.txt> for more details on this.
This option also enables s390 zfcpdump.
See also <file:Documentation/s390/zfcpdump.txt>
endmenu
menu "Executable file formats / Emulations"
source "fs/Kconfig.binfmt"
config SECCOMP
def_bool y
prompt "Enable seccomp to safely compute untrusted bytecode"
depends on PROC_FS
help
This kernel feature is useful for number crunching applications
that may need to compute untrusted bytecode during their
execution. By using pipes or other transports made available to
the process as file descriptors supporting the read/write
syscalls, it's possible to isolate those applications in
their own address space using seccomp. Once seccomp is
enabled via /proc/<pid>/seccomp, it cannot be disabled
and the task is only allowed to execute a few safe syscalls
defined by each seccomp mode.
If unsure, say Y.
endmenu
menu "Power Management"
config ARCH_HIBERNATION_POSSIBLE
def_bool y
source "kernel/power/Kconfig"
endmenu
source "net/Kconfig"
config PCMCIA
def_bool n
config CCW
def_bool y
source "drivers/Kconfig"
source "fs/Kconfig"
source "arch/s390/Kconfig.debug"
source "security/Kconfig"
source "crypto/Kconfig"
source "lib/Kconfig"
menu "Virtualization"
config PFAULT
def_bool y
prompt "Pseudo page fault support"
help
Select this option, if you want to use PFAULT pseudo page fault
handling under VM. If running native or in LPAR, this option
has no effect. If your VM does not support PFAULT, PAGEEX
pseudo page fault handling will be used.
Note that VM 4.2 supports PFAULT but has a bug in its
implementation that causes some problems.
Everybody who wants to run Linux under VM != VM4.2 should select
this option.
config SHARED_KERNEL
bool "VM shared kernel support"
depends on !JUMP_LABEL
help
Select this option, if you want to share the text segment of the
Linux kernel between different VM guests. This reduces memory
usage with lots of guests but greatly increases kernel size.
Also if a kernel was IPL'ed from a shared segment the kexec system
call will not work.
You should only select this option if you know what you are
doing and want to exploit this feature.
config CMM
def_tristate n
prompt "Cooperative memory management"
help
Select this option, if you want to enable the kernel interface
to reduce the memory size of the system. This is accomplished
by allocating pages of memory and put them "on hold". This only
makes sense for a system running under VM where the unused pages
will be reused by VM for other guest systems. The interface
allows an external monitor to balance memory of many systems.
Everybody who wants to run Linux under VM should select this
option.
config CMM_IUCV
def_bool y
prompt "IUCV special message interface to cooperative memory management"
depends on CMM && (SMSGIUCV=y || CMM=SMSGIUCV)
help
Select this option to enable the special message interface to
the cooperative memory management.
config APPLDATA_BASE
def_bool n
prompt "Linux - VM Monitor Stream, base infrastructure"
depends on PROC_FS
help
This provides a kernel interface for creating and updating z/VM APPLDATA
monitor records. The monitor records are updated at certain time
intervals, once the timer is started.
Writing 1 or 0 to /proc/appldata/timer starts(1) or stops(0) the timer,
i.e. enables or disables monitoring on the Linux side.
A custom interval value (in seconds) can be written to
/proc/appldata/interval.
Defaults are 60 seconds interval and timer off.
The /proc entries can also be read from, showing the current settings.
config APPLDATA_MEM
def_tristate m
prompt "Monitor memory management statistics"
depends on APPLDATA_BASE && VM_EVENT_COUNTERS
help
This provides memory management related data to the Linux - VM Monitor
Stream, like paging/swapping rate, memory utilisation, etc.
Writing 1 or 0 to /proc/appldata/memory creates(1) or removes(0) a z/VM
APPLDATA monitor record, i.e. enables or disables monitoring this record
on the z/VM side.
Default is disabled.
The /proc entry can also be read from, showing the current settings.
This can also be compiled as a module, which will be called
appldata_mem.o.
config APPLDATA_OS
def_tristate m
prompt "Monitor OS statistics"
depends on APPLDATA_BASE
help
This provides OS related data to the Linux - VM Monitor Stream, like
CPU utilisation, etc.
Writing 1 or 0 to /proc/appldata/os creates(1) or removes(0) a z/VM
APPLDATA monitor record, i.e. enables or disables monitoring this record
on the z/VM side.
Default is disabled.
This can also be compiled as a module, which will be called
appldata_os.o.
config APPLDATA_NET_SUM
def_tristate m
prompt "Monitor overall network statistics"
depends on APPLDATA_BASE && NET
help
This provides network related data to the Linux - VM Monitor Stream,
currently there is only a total sum of network I/O statistics, no
per-interface data.
Writing 1 or 0 to /proc/appldata/net_sum creates(1) or removes(0) a z/VM
APPLDATA monitor record, i.e. enables or disables monitoring this record
on the z/VM side.
Default is disabled.
This can also be compiled as a module, which will be called
appldata_net_sum.o.
[PATCH] s390_hypfs filesystem On zSeries machines there exists an interface which allows the operating system to retrieve LPAR hypervisor accounting data. For example, it is possible to get usage data for physical and virtual cpus. In order to provide this information to user space programs, I implemented a new virtual Linux file system named 's390_hypfs' using the Linux 2.6 libfs framework. The name 's390_hypfs' stands for 'S390 Hypervisor Filesystem'. All the accounting information is put into different virtual files which can be accessed from user space. All data is represented as ASCII strings. When the file system is mounted the accounting information is retrieved and a file system tree is created with the attribute files containing the cpu information. The content of the files remains unchanged until a new update is made. An update can be triggered from user space through writing 'something' into a special purpose update file. We create the following directory structure: <mount-point>/ update cpus/ <cpu-id> type mgmtime <cpu-id> ... hyp/ type systems/ <lpar-name> cpus/ <cpu-id> type mgmtime cputime onlinetime <cpu-id> ... <lpar-name> cpus/ ... - update: File to trigger update - cpus/: Directory for all physical cpus - cpus/<cpu-id>/: Directory for one physical cpu. - cpus/<cpu-id>/type: Type name of physical zSeries cpu. - cpus/<cpu-id>/mgmtime: Physical-LPAR-management time in microseconds. - hyp/: Directory for hypervisor information - hyp/type: Typ of hypervisor (currently only 'LPAR Hypervisor') - systems/: Directory for all LPARs - systems/<lpar-name>/: Directory for one LPAR. - systems/<lpar-name>/cpus/<cpu-id>/: Directory for the virtual cpus - systems/<lpar-name>/cpus/<cpu-id>/type: Typ of cpu. - systems/<lpar-name>/cpus/<cpu-id>/mgmtime: Accumulated number of microseconds during which a physical CPU was assigned to the logical cpu and the cpu time was consumed by the hypervisor and was not provided to the LPAR (LPAR overhead). - systems/<lpar-name>/cpus/<cpu-id>/cputime: Accumulated number of microseconds during which a physical CPU was assigned to the logical cpu and the cpu time was consumed by the LPAR. - systems/<lpar-name>/cpus/<cpu-id>/onlinetime: Accumulated number of microseconds during which the logical CPU has been online. As mount point for the filesystem /sys/hypervisor/s390 is created. The update process is triggered when writing 'something' into the 'update' file at the top level hypfs directory. You can do this e.g. with 'echo 1 > update'. During the update the whole directory structure is deleted and built up again. Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Ingo Oeser <ioe-lkml@rameria.de> Cc: Joern Engel <joern@wohnheim.fh-wedel.de> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Michael Holzheu <holzheu@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 16:05:06 +07:00
config S390_HYPFS_FS
def_bool y
prompt "s390 hypervisor file system support"
[PATCH] s390_hypfs filesystem On zSeries machines there exists an interface which allows the operating system to retrieve LPAR hypervisor accounting data. For example, it is possible to get usage data for physical and virtual cpus. In order to provide this information to user space programs, I implemented a new virtual Linux file system named 's390_hypfs' using the Linux 2.6 libfs framework. The name 's390_hypfs' stands for 'S390 Hypervisor Filesystem'. All the accounting information is put into different virtual files which can be accessed from user space. All data is represented as ASCII strings. When the file system is mounted the accounting information is retrieved and a file system tree is created with the attribute files containing the cpu information. The content of the files remains unchanged until a new update is made. An update can be triggered from user space through writing 'something' into a special purpose update file. We create the following directory structure: <mount-point>/ update cpus/ <cpu-id> type mgmtime <cpu-id> ... hyp/ type systems/ <lpar-name> cpus/ <cpu-id> type mgmtime cputime onlinetime <cpu-id> ... <lpar-name> cpus/ ... - update: File to trigger update - cpus/: Directory for all physical cpus - cpus/<cpu-id>/: Directory for one physical cpu. - cpus/<cpu-id>/type: Type name of physical zSeries cpu. - cpus/<cpu-id>/mgmtime: Physical-LPAR-management time in microseconds. - hyp/: Directory for hypervisor information - hyp/type: Typ of hypervisor (currently only 'LPAR Hypervisor') - systems/: Directory for all LPARs - systems/<lpar-name>/: Directory for one LPAR. - systems/<lpar-name>/cpus/<cpu-id>/: Directory for the virtual cpus - systems/<lpar-name>/cpus/<cpu-id>/type: Typ of cpu. - systems/<lpar-name>/cpus/<cpu-id>/mgmtime: Accumulated number of microseconds during which a physical CPU was assigned to the logical cpu and the cpu time was consumed by the hypervisor and was not provided to the LPAR (LPAR overhead). - systems/<lpar-name>/cpus/<cpu-id>/cputime: Accumulated number of microseconds during which a physical CPU was assigned to the logical cpu and the cpu time was consumed by the LPAR. - systems/<lpar-name>/cpus/<cpu-id>/onlinetime: Accumulated number of microseconds during which the logical CPU has been online. As mount point for the filesystem /sys/hypervisor/s390 is created. The update process is triggered when writing 'something' into the 'update' file at the top level hypfs directory. You can do this e.g. with 'echo 1 > update'. During the update the whole directory structure is deleted and built up again. Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Ingo Oeser <ioe-lkml@rameria.de> Cc: Joern Engel <joern@wohnheim.fh-wedel.de> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Michael Holzheu <holzheu@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 16:05:06 +07:00
select SYS_HYPERVISOR
help
This is a virtual file system intended to provide accounting
information in an s390 hypervisor environment.
source "arch/s390/kvm/Kconfig"
config S390_GUEST
def_bool y
prompt "s390 support for virtio devices"
select TTY
select VIRTUALIZATION
select VIRTIO
select VIRTIO_CONSOLE
help
Enabling this option adds support for virtio based paravirtual device
drivers on s390.
Select this option if you want to run the kernel as a guest under
the KVM hypervisor.
endmenu