linux_dsm_epyc7002/arch/s390/Kconfig

655 lines
18 KiB
Plaintext
Raw Normal View History

config MMU
def_bool y
config ZONE_DMA
def_bool y if 64BIT
config LOCKDEP_SUPPORT
def_bool y
config STACKTRACE_SUPPORT
def_bool y
config HAVE_LATENCYTOP_SUPPORT
def_bool y
config RWSEM_GENERIC_SPINLOCK
bool
config RWSEM_XCHGADD_ALGORITHM
def_bool y
config ARCH_HAS_ILOG2_U32
def_bool n
config ARCH_HAS_ILOG2_U64
def_bool n
config GENERIC_HWEIGHT
def_bool y
config GENERIC_TIME_VSYSCALL
def_bool y
config GENERIC_CLOCKEVENTS
def_bool y
config GENERIC_BUG
def_bool y if BUG
config GENERIC_BUG_RELATIVE_POINTERS
def_bool y
config NO_IOMEM
def_bool y
config NO_DMA
def_bool y
config ARCH_DMA_ADDR_T_64BIT
def_bool 64BIT
config GENERIC_LOCKBREAK
def_bool y if SMP && PREEMPT
config PGSTE
def_bool y if KVM
config VIRT_CPU_ACCOUNTING
def_bool y
config ARCH_SUPPORTS_DEBUG_PAGEALLOC
def_bool y
config S390
def_bool y
select USE_GENERIC_SMP_HELPERS if SMP
select HAVE_SYSCALL_WRAPPERS
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_TRACE_MCOUNT_TEST
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_C_RECORDMCOUNT
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_DYNAMIC_FTRACE
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_DEFAULT_NO_SPIN_MUTEXES
select HAVE_OPROFILE
select HAVE_KPROBES
select HAVE_KRETPROBES
select HAVE_KVM if 64BIT
select HAVE_ARCH_TRACEHOOK
select INIT_ALL_POSSIBLE
select HAVE_IRQ_WORK
perf: Do the big rename: Performance Counters -> Performance Events Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N | sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 17:02:48 +07:00
select HAVE_PERF_EVENTS
select HAVE_KERNEL_GZIP
select HAVE_KERNEL_BZIP2
select HAVE_KERNEL_LZMA
select HAVE_KERNEL_LZO
select HAVE_GET_USER_PAGES_FAST
select ARCH_INLINE_SPIN_TRYLOCK
select ARCH_INLINE_SPIN_TRYLOCK_BH
select ARCH_INLINE_SPIN_LOCK
select ARCH_INLINE_SPIN_LOCK_BH
select ARCH_INLINE_SPIN_LOCK_IRQ
select ARCH_INLINE_SPIN_LOCK_IRQSAVE
select ARCH_INLINE_SPIN_UNLOCK
select ARCH_INLINE_SPIN_UNLOCK_BH
select ARCH_INLINE_SPIN_UNLOCK_IRQ
select ARCH_INLINE_SPIN_UNLOCK_IRQRESTORE
select ARCH_INLINE_READ_TRYLOCK
select ARCH_INLINE_READ_LOCK
select ARCH_INLINE_READ_LOCK_BH
select ARCH_INLINE_READ_LOCK_IRQ
select ARCH_INLINE_READ_LOCK_IRQSAVE
select ARCH_INLINE_READ_UNLOCK
select ARCH_INLINE_READ_UNLOCK_BH
select ARCH_INLINE_READ_UNLOCK_IRQ
select ARCH_INLINE_READ_UNLOCK_IRQRESTORE
select ARCH_INLINE_WRITE_TRYLOCK
select ARCH_INLINE_WRITE_LOCK
select ARCH_INLINE_WRITE_LOCK_BH
select ARCH_INLINE_WRITE_LOCK_IRQ
select ARCH_INLINE_WRITE_LOCK_IRQSAVE
select ARCH_INLINE_WRITE_UNLOCK
select ARCH_INLINE_WRITE_UNLOCK_BH
select ARCH_INLINE_WRITE_UNLOCK_IRQ
select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE
config SCHED_OMIT_FRAME_POINTER
def_bool y
source "init/Kconfig"
container freezer: implement freezer cgroup subsystem This patch implements a new freezer subsystem in the control groups framework. It provides a way to stop and resume execution of all tasks in a cgroup by writing in the cgroup filesystem. The freezer subsystem in the container filesystem defines a file named freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in the cgroup. Reading will return the current state. * Examples of usage : # mkdir /containers/freezer # mount -t cgroup -ofreezer freezer /containers # mkdir /containers/0 # echo $some_pid > /containers/0/tasks to get status of the freezer subsystem : # cat /containers/0/freezer.state RUNNING to freeze all tasks in the container : # echo FROZEN > /containers/0/freezer.state # cat /containers/0/freezer.state FREEZING # cat /containers/0/freezer.state FROZEN to unfreeze all tasks in the container : # echo RUNNING > /containers/0/freezer.state # cat /containers/0/freezer.state RUNNING This is the basic mechanism which should do the right thing for user space task in a simple scenario. It's important to note that freezing can be incomplete. In that case we return EBUSY. This means that some tasks in the cgroup are busy doing something that prevents us from completely freezing the cgroup at this time. After EBUSY, the cgroup will remain partially frozen -- reflected by freezer.state reporting "FREEZING" when read. The state will remain "FREEZING" until one of these things happens: 1) Userspace cancels the freezing operation by writing "RUNNING" to the freezer.state file 2) Userspace retries the freezing operation by writing "FROZEN" to the freezer.state file (writing "FREEZING" is not legal and returns EIO) 3) The tasks that blocked the cgroup from entering the "FROZEN" state disappear from the cgroup's set of tasks. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: export thaw_process] Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Tested-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-19 10:27:21 +07:00
source "kernel/Kconfig.freezer"
menu "Base setup"
comment "Processor type and features"
source "kernel/time/Kconfig"
config 64BIT
def_bool y
prompt "64 bit kernel"
help
Select this option if you have an IBM z/Architecture machine
and want to use the 64 bit addressing mode.
config 32BIT
def_bool y if !64BIT
config KTIME_SCALAR
def_bool 32BIT
config SMP
def_bool y
prompt "Symmetric multi-processing support"
---help---
This enables support for systems with more than one CPU. If you have
a system with only one CPU, like most personal computers, say N. If
you have a system with more than one CPU, say Y.
If you say N here, the kernel will run on single and multiprocessor
machines, but will use only one CPU of a multiprocessor machine. If
you say Y here, the kernel will run on many, but not all,
singleprocessor machines. On a singleprocessor machine, the kernel
will run faster if you say N here.
See also the SMP-HOWTO available at
<http://www.tldp.org/docs.html#howto>.
Even if you don't know what to do here, say Y.
config NR_CPUS
int "Maximum number of CPUs (2-64)"
range 2 64
depends on SMP
default "32" if !64BIT
default "64" if 64BIT
help
This allows you to specify the maximum number of CPUs which this
kernel will support. The maximum supported value is 64 and the
minimum value which makes sense is 2.
This is purely to save memory - each supported CPU adds
approximately sixteen kilobytes to the kernel image.
config HOTPLUG_CPU
def_bool y
prompt "Support for hot-pluggable CPUs"
depends on SMP
select HOTPLUG
help
Say Y here to be able to turn CPUs off and on. CPUs
can be controlled through /sys/devices/system/cpu/cpu#.
Say N if you want to disable CPU hotplug.
config SCHED_MC
def_bool y
prompt "Multi-core scheduler support"
depends on SMP
help
Multi-core scheduler support improves the CPU scheduler's decision
making when dealing with multi-core CPU chips at a cost of slightly
increased overhead in some places.
config SCHED_BOOK
def_bool y
prompt "Book scheduler support"
depends on SMP && SCHED_MC
help
Book scheduler support improves the CPU scheduler's decision making
when dealing with machines that have several books.
config MATHEMU
def_bool y
prompt "IEEE FPU emulation"
depends on MARCH_G5
help
This option is required for IEEE compliant floating point arithmetic
on older ESA/390 machines. Say Y unless you know your machine doesn't
need this.
config COMPAT
def_bool y
prompt "Kernel support for 31 bit emulation"
depends on 64BIT
select COMPAT_BINFMT_ELF
help
Select this option if you want to enable your system kernel to
handle system-calls from ELF binaries for 31 bit ESA. This option
(and some other stuff like libraries and such) is needed for
executing 31 bit applications. It is safe to say "Y".
config SYSVIPC_COMPAT
def_bool y if COMPAT && SYSVIPC
config AUDIT_ARCH
def_bool y
[S390] noexec protection This provides a noexec protection on s390 hardware. Our hardware does not have any bits left in the pte for a hw noexec bit, so this is a different approach using shadow page tables and a special addressing mode that allows separate address spaces for code and data. As a special feature of our "secondary-space" addressing mode, separate page tables can be specified for the translation of data addresses (storage operands) and instruction addresses. The shadow page table is used for the instruction addresses and the standard page table for the data addresses. The shadow page table is linked to the standard page table by a pointer in page->lru.next of the struct page corresponding to the page that contains the standard page table (since page->private is not really private with the pte_lock and the page table pages are not in the LRU list). Depending on the software bits of a pte, it is either inserted into both page tables or just into the standard (data) page table. Pages of a vma that does not have the VM_EXEC bit set get mapped only in the data address space. Any try to execute code on such a page will cause a page translation exception. The standard reaction to this is a SIGSEGV with two exceptions: the two system call opcodes 0x0a77 (sys_sigreturn) and 0x0aad (sys_rt_sigreturn) are allowed. They are stored by the kernel to the signal stack frame. Unfortunately, the signal return mechanism cannot be modified to use an SA_RESTORER because the exception unwinding code depends on the system call opcode stored behind the signal stack frame. This feature requires that user space is executed in secondary-space mode and the kernel in home-space mode, which means that the addressing modes need to be switched and that the noexec protection only works for user space. After switching the addressing modes, we cannot use the mvcp/mvcs instructions anymore to copy between kernel and user space. A new mvcos instruction has been added to the z9 EC/BC hardware which allows to copy between arbitrary address spaces, but on older hardware the page tables need to be walked manually. Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-02-06 03:18:17 +07:00
config S390_EXEC_PROTECT
def_bool y
prompt "Data execute protection"
[S390] noexec protection This provides a noexec protection on s390 hardware. Our hardware does not have any bits left in the pte for a hw noexec bit, so this is a different approach using shadow page tables and a special addressing mode that allows separate address spaces for code and data. As a special feature of our "secondary-space" addressing mode, separate page tables can be specified for the translation of data addresses (storage operands) and instruction addresses. The shadow page table is used for the instruction addresses and the standard page table for the data addresses. The shadow page table is linked to the standard page table by a pointer in page->lru.next of the struct page corresponding to the page that contains the standard page table (since page->private is not really private with the pte_lock and the page table pages are not in the LRU list). Depending on the software bits of a pte, it is either inserted into both page tables or just into the standard (data) page table. Pages of a vma that does not have the VM_EXEC bit set get mapped only in the data address space. Any try to execute code on such a page will cause a page translation exception. The standard reaction to this is a SIGSEGV with two exceptions: the two system call opcodes 0x0a77 (sys_sigreturn) and 0x0aad (sys_rt_sigreturn) are allowed. They are stored by the kernel to the signal stack frame. Unfortunately, the signal return mechanism cannot be modified to use an SA_RESTORER because the exception unwinding code depends on the system call opcode stored behind the signal stack frame. This feature requires that user space is executed in secondary-space mode and the kernel in home-space mode, which means that the addressing modes need to be switched and that the noexec protection only works for user space. After switching the addressing modes, we cannot use the mvcp/mvcs instructions anymore to copy between kernel and user space. A new mvcos instruction has been added to the z9 EC/BC hardware which allows to copy between arbitrary address spaces, but on older hardware the page tables need to be walked manually. Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-02-06 03:18:17 +07:00
help
This option allows to enable a buffer overflow protection for user
space programs and it also selects the addressing mode option above.
The kernel parameter noexec=on will enable this feature and also
switch the addressing modes, default is disabled. Enabling this (via
kernel parameter) on machines earlier than IBM System z9 this will
reduce system performance.
[S390] noexec protection This provides a noexec protection on s390 hardware. Our hardware does not have any bits left in the pte for a hw noexec bit, so this is a different approach using shadow page tables and a special addressing mode that allows separate address spaces for code and data. As a special feature of our "secondary-space" addressing mode, separate page tables can be specified for the translation of data addresses (storage operands) and instruction addresses. The shadow page table is used for the instruction addresses and the standard page table for the data addresses. The shadow page table is linked to the standard page table by a pointer in page->lru.next of the struct page corresponding to the page that contains the standard page table (since page->private is not really private with the pte_lock and the page table pages are not in the LRU list). Depending on the software bits of a pte, it is either inserted into both page tables or just into the standard (data) page table. Pages of a vma that does not have the VM_EXEC bit set get mapped only in the data address space. Any try to execute code on such a page will cause a page translation exception. The standard reaction to this is a SIGSEGV with two exceptions: the two system call opcodes 0x0a77 (sys_sigreturn) and 0x0aad (sys_rt_sigreturn) are allowed. They are stored by the kernel to the signal stack frame. Unfortunately, the signal return mechanism cannot be modified to use an SA_RESTORER because the exception unwinding code depends on the system call opcode stored behind the signal stack frame. This feature requires that user space is executed in secondary-space mode and the kernel in home-space mode, which means that the addressing modes need to be switched and that the noexec protection only works for user space. After switching the addressing modes, we cannot use the mvcp/mvcs instructions anymore to copy between kernel and user space. A new mvcos instruction has been added to the z9 EC/BC hardware which allows to copy between arbitrary address spaces, but on older hardware the page tables need to be walked manually. Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-02-06 03:18:17 +07:00
comment "Code generation options"
choice
prompt "Processor type"
default MARCH_G5
config MARCH_G5
bool "System/390 model G5 and G6"
depends on !64BIT
help
Select this to build a 31 bit kernel that works
on all ESA/390 and z/Architecture machines.
config MARCH_Z900
bool "IBM zSeries model z800 and z900"
help
Select this to enable optimizations for model z800/z900 (2064 and
2066 series). This will enable some optimizations that are not
available on older ESA/390 (31 Bit) only CPUs.
config MARCH_Z990
bool "IBM zSeries model z890 and z990"
help
Select this to enable optimizations for model z890/z990 (2084 and
2086 series). The kernel will be slightly faster but will not work
on older machines.
config MARCH_Z9_109
bool "IBM System z9"
help
Select this to enable optimizations for IBM System z9 (2094 and
2096 series). The kernel will be slightly faster but will not work
on older machines.
config MARCH_Z10
bool "IBM System z10"
help
Select this to enable optimizations for IBM System z10 (2097 and
2098 series). The kernel will be slightly faster but will not work
on older machines.
config MARCH_Z196
bool "IBM zEnterprise 196"
help
Select this to enable optimizations for IBM zEnterprise 196
(2817 series). The kernel will be slightly faster but will not work
on older machines.
endchoice
config PACK_STACK
def_bool y
prompt "Pack kernel stack"
help
This option enables the compiler option -mkernel-backchain if it
is available. If the option is available the compiler supports
the new stack layout which dramatically reduces the minimum stack
frame size. With an old compiler a non-leaf function needs a
minimum of 96 bytes on 31 bit and 160 bytes on 64 bit. With
-mkernel-backchain the minimum size drops to 16 byte on 31 bit
and 24 byte on 64 bit.
Say Y if you are unsure.
config SMALL_STACK
def_bool n
prompt "Use 8kb for kernel stack instead of 16kb"
[S390] No more 4kb stacks. We got a stack overflow with a small stack configuration on a 32 bit system. It just looks like as 4kb isn't enough and too dangerous. So lets get rid of 4kb stacks on 32 bit. But one thing I completely dislike about the call trace below is that just for debugging or tracing purposes sprintf gets called (cio_start_key): /* process condition code */ sprintf(dbf_txt, "ccode:%d", ccode); CIO_TRACE_EVENT(4, dbf_txt); But maybe its just me who thinks that this could be done better. <4>Kernel stack overflow. <4>Modules linked in: dm_multipath sunrpc bonding qeth_l2 dm_mod qeth ccwgroup vmur <4>CPU: 1 Not tainted 2.6.27-30.x.20081015-s390default #1 <4>Process httpd (pid: 3807, task: 20ae2df8, ksp: 1666fb78) <4>Krnl PSW : 040c0000 8027098a (number+0xe/0x348) <4> R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 <4>Krnl GPRS: 00d43318 0027097c 1666f277 9666f270 <4> 00000000 00000000 0000000a ffffffff <4> 9666f270 1666f228 1666f277 1666f098 <4> 00000002 80270982 80271016 1666f098 <4>Krnl Code: 8027097e: f0340dd0a7f1 srp 3536(4,%r0),2033(%r10),4 <4> 80270984: 0f00 clcl %r0,%r0 <4> 80270986: a7840001 brc 8,80270988 <4> >8027098a: 18ef lr %r14,%r15 <4> 8027098c: a7faff68 ahi %r15,-152 <4> 80270990: 18bf lr %r11,%r15 <4> 80270992: 18a2 lr %r10,%r2 <4> 80270994: 1893 lr %r9,%r3 Modified calltrace with annotated stackframe size of each function: stackframe size | 0 304 vsnprintf+850 [0x271016] 1 72 sprintf+74 [0x271522] 2 56 cio_start_key+262 [0x2d4c16] 3 56 ccw_device_start_key+222 [0x2dfe92] 4 56 ccw_device_start+40 [0x2dff28] 5 48 raw3215_start_io+104 [0x30b0f8] 6 56 raw3215_write+494 [0x30ba0a] 7 40 con3215_write+68 [0x30bafc] 8 40 __call_console_drivers+146 [0x12b0fa] 9 32 _call_console_drivers+102 [0x12b192] 10 64 release_console_sem+268 [0x12b614] 11 168 vprintk+462 [0x12bca6] 12 72 printk+68 [0x12bfd0] 13 256 __print_symbol+50 [0x15a882] 14 56 __show_trace+162 [0x103d06] 15 32 show_trace+224 [0x103e70] 16 48 show_stack+152 [0x103f20] 17 56 dump_stack+126 [0x104612] 18 96 __alloc_pages_internal+592 [0x175004] 19 80 cache_alloc_refill+776 [0x196f3c] 20 40 __kmalloc+258 [0x1972ae] 21 40 __alloc_skb+94 [0x328086] 22 32 pskb_copy+50 [0x328252] 23 32 skb_realloc_headroom+110 [0x328a72] 24 104 qeth_l2_hard_start_xmit+378 [0x7803bfde] 25 56 dev_hard_start_xmit+450 [0x32ef6e] 26 56 __qdisc_run+390 [0x3425d6] 27 48 dev_queue_xmit+410 [0x331e06] 28 40 ip_finish_output+308 [0x354ac8] 29 56 ip_output+218 [0x355b6e] 30 24 ip_local_out+56 [0x354584] 31 120 ip_queue_xmit+300 [0x355cec] 32 96 tcp_transmit_skb+812 [0x367da8] 33 40 tcp_push_one+158 [0x369fda] 34 112 tcp_sendmsg+852 [0x35d5a0] 35 240 sock_sendmsg+164 [0x32035c] 36 56 kernel_sendmsg+86 [0x32064a] 37 88 sock_no_sendpage+98 [0x322b22] 38 104 tcp_sendpage+70 [0x35cc1e] 39 48 sock_sendpage+74 [0x31eb66] 40 64 pipe_to_sendpage+102 [0x1c4b2e] 41 64 __splice_from_pipe+120 [0x1c5340] 42 72 splice_from_pipe+90 [0x1c57e6] 43 56 generic_splice_sendpage+38 [0x1c5832] 44 48 do_splice_from+104 [0x1c4c38] 45 48 direct_splice_actor+52 [0x1c4c88] 46 80 splice_direct_to_actor+180 [0x1c4f80] 47 72 do_splice_direct+70 [0x1c5112] 48 64 do_sendfile+360 [0x19de18] 49 72 sys_sendfile64+126 [0x19df32] 50 336 sysc_do_restart+18 [0x111a1a] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-10-28 17:10:21 +07:00
depends on PACK_STACK && 64BIT && !LOCKDEP
help
If you say Y here and the compiler supports the -mkernel-backchain
[S390] No more 4kb stacks. We got a stack overflow with a small stack configuration on a 32 bit system. It just looks like as 4kb isn't enough and too dangerous. So lets get rid of 4kb stacks on 32 bit. But one thing I completely dislike about the call trace below is that just for debugging or tracing purposes sprintf gets called (cio_start_key): /* process condition code */ sprintf(dbf_txt, "ccode:%d", ccode); CIO_TRACE_EVENT(4, dbf_txt); But maybe its just me who thinks that this could be done better. <4>Kernel stack overflow. <4>Modules linked in: dm_multipath sunrpc bonding qeth_l2 dm_mod qeth ccwgroup vmur <4>CPU: 1 Not tainted 2.6.27-30.x.20081015-s390default #1 <4>Process httpd (pid: 3807, task: 20ae2df8, ksp: 1666fb78) <4>Krnl PSW : 040c0000 8027098a (number+0xe/0x348) <4> R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 <4>Krnl GPRS: 00d43318 0027097c 1666f277 9666f270 <4> 00000000 00000000 0000000a ffffffff <4> 9666f270 1666f228 1666f277 1666f098 <4> 00000002 80270982 80271016 1666f098 <4>Krnl Code: 8027097e: f0340dd0a7f1 srp 3536(4,%r0),2033(%r10),4 <4> 80270984: 0f00 clcl %r0,%r0 <4> 80270986: a7840001 brc 8,80270988 <4> >8027098a: 18ef lr %r14,%r15 <4> 8027098c: a7faff68 ahi %r15,-152 <4> 80270990: 18bf lr %r11,%r15 <4> 80270992: 18a2 lr %r10,%r2 <4> 80270994: 1893 lr %r9,%r3 Modified calltrace with annotated stackframe size of each function: stackframe size | 0 304 vsnprintf+850 [0x271016] 1 72 sprintf+74 [0x271522] 2 56 cio_start_key+262 [0x2d4c16] 3 56 ccw_device_start_key+222 [0x2dfe92] 4 56 ccw_device_start+40 [0x2dff28] 5 48 raw3215_start_io+104 [0x30b0f8] 6 56 raw3215_write+494 [0x30ba0a] 7 40 con3215_write+68 [0x30bafc] 8 40 __call_console_drivers+146 [0x12b0fa] 9 32 _call_console_drivers+102 [0x12b192] 10 64 release_console_sem+268 [0x12b614] 11 168 vprintk+462 [0x12bca6] 12 72 printk+68 [0x12bfd0] 13 256 __print_symbol+50 [0x15a882] 14 56 __show_trace+162 [0x103d06] 15 32 show_trace+224 [0x103e70] 16 48 show_stack+152 [0x103f20] 17 56 dump_stack+126 [0x104612] 18 96 __alloc_pages_internal+592 [0x175004] 19 80 cache_alloc_refill+776 [0x196f3c] 20 40 __kmalloc+258 [0x1972ae] 21 40 __alloc_skb+94 [0x328086] 22 32 pskb_copy+50 [0x328252] 23 32 skb_realloc_headroom+110 [0x328a72] 24 104 qeth_l2_hard_start_xmit+378 [0x7803bfde] 25 56 dev_hard_start_xmit+450 [0x32ef6e] 26 56 __qdisc_run+390 [0x3425d6] 27 48 dev_queue_xmit+410 [0x331e06] 28 40 ip_finish_output+308 [0x354ac8] 29 56 ip_output+218 [0x355b6e] 30 24 ip_local_out+56 [0x354584] 31 120 ip_queue_xmit+300 [0x355cec] 32 96 tcp_transmit_skb+812 [0x367da8] 33 40 tcp_push_one+158 [0x369fda] 34 112 tcp_sendmsg+852 [0x35d5a0] 35 240 sock_sendmsg+164 [0x32035c] 36 56 kernel_sendmsg+86 [0x32064a] 37 88 sock_no_sendpage+98 [0x322b22] 38 104 tcp_sendpage+70 [0x35cc1e] 39 48 sock_sendpage+74 [0x31eb66] 40 64 pipe_to_sendpage+102 [0x1c4b2e] 41 64 __splice_from_pipe+120 [0x1c5340] 42 72 splice_from_pipe+90 [0x1c57e6] 43 56 generic_splice_sendpage+38 [0x1c5832] 44 48 do_splice_from+104 [0x1c4c38] 45 48 direct_splice_actor+52 [0x1c4c88] 46 80 splice_direct_to_actor+180 [0x1c4f80] 47 72 do_splice_direct+70 [0x1c5112] 48 64 do_sendfile+360 [0x19de18] 49 72 sys_sendfile64+126 [0x19df32] 50 336 sysc_do_restart+18 [0x111a1a] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-10-28 17:10:21 +07:00
option the kernel will use a smaller kernel stack size. The reduced
size is 8kb instead of 16kb. This allows to run more threads on a
system and reduces the pressure on the memory management for higher
order page allocations.
Say N if you are unsure.
config CHECK_STACK
def_bool y
prompt "Detect kernel stack overflow"
help
This option enables the compiler option -mstack-guard and
-mstack-size if they are available. If the compiler supports them
it will emit additional code to each function prolog to trigger
an illegal operation if the kernel stack is about to overflow.
Say N if you are unsure.
config STACK_GUARD
int "Size of the guard area (128-1024)"
range 128 1024
depends on CHECK_STACK
default "256"
help
This allows you to specify the size of the guard area at the lower
end of the kernel stack. If the kernel stack points into the guard
area on function entry an illegal operation is triggered. The size
needs to be a power of 2. Please keep in mind that the size of an
interrupt frame is 184 bytes for 31 bit and 328 bytes on 64 bit.
The minimum size for the stack guard should be 256 for 31 bit and
512 for 64 bit.
config WARN_STACK
def_bool n
prompt "Emit compiler warnings for function with broken stack usage"
help
This option enables the compiler options -mwarn-framesize and
-mwarn-dynamicstack. If the compiler supports these options it
will generate warnings for function which either use alloca or
create a stack frame bigger than CONFIG_WARN_STACK_SIZE.
Say N if you are unsure.
config WARN_STACK_SIZE
int "Maximum frame size considered safe (128-2048)"
range 128 2048
depends on WARN_STACK
default "2048"
help
This allows you to specify the maximum frame size a function may
have without the compiler complaining about it.
config ARCH_POPULATES_NODE_MAP
def_bool y
comment "Kernel preemption"
source "kernel/Kconfig.preempt"
config ARCH_SPARSEMEM_ENABLE
def_bool y
select SPARSEMEM_VMEMMAP_ENABLE
select SPARSEMEM_VMEMMAP
select SPARSEMEM_STATIC if !64BIT
config ARCH_SPARSEMEM_DEFAULT
def_bool y
config ARCH_SELECT_MEMORY_MODEL
def_bool y
config ARCH_ENABLE_MEMORY_HOTPLUG
def_bool y if SPARSEMEM
config ARCH_ENABLE_MEMORY_HOTREMOVE
def_bool y
config ARCH_HIBERNATION_POSSIBLE
def_bool y if 64BIT
source "mm/Kconfig"
comment "I/O subsystem configuration"
config QDIO
def_tristate y
prompt "QDIO support"
---help---
This driver provides the Queued Direct I/O base support for
IBM System z.
To compile this driver as a module, choose M here: the
module will be called qdio.
If unsure, say Y.
config CHSC_SCH
def_tristate y
prompt "Support for CHSC subchannels"
help
This driver allows usage of CHSC subchannels. A CHSC subchannel
is usually present on LPAR only.
The driver creates a device /dev/chsc, which may be used to
obtain I/O configuration information about the machine and
to issue asynchronous chsc commands (DANGEROUS).
You will usually only want to use this interface on a special
LPAR designated for system management.
To compile this driver as a module, choose M here: the
module will be called chsc_sch.
If unsure, say N.
comment "Misc"
config IPL
def_bool y
prompt "Builtin IPL record support"
help
If you want to use the produced kernel to IPL directly from a
device, you have to merge a bootsector specific to the device
into the first bytes of the kernel. You will have to select the
IPL device.
choice
prompt "IPL method generated into head.S"
depends on IPL
default IPL_VM
help
Select "tape" if you want to IPL the image from a Tape.
Select "vm_reader" if you are running under VM/ESA and want
to IPL the image from the emulated card reader.
config IPL_TAPE
bool "tape"
config IPL_VM
bool "vm_reader"
endchoice
source "fs/Kconfig.binfmt"
config FORCE_MAX_ZONEORDER
int
default "9"
config PFAULT
def_bool y
prompt "Pseudo page fault support"
help
Select this option, if you want to use PFAULT pseudo page fault
handling under VM. If running native or in LPAR, this option
has no effect. If your VM does not support PFAULT, PAGEEX
pseudo page fault handling will be used.
Note that VM 4.2 supports PFAULT but has a bug in its
implementation that causes some problems.
Everybody who wants to run Linux under VM != VM4.2 should select
this option.
config SHARED_KERNEL
def_bool y
prompt "VM shared kernel support"
help
Select this option, if you want to share the text segment of the
Linux kernel between different VM guests. This reduces memory
usage with lots of guests but greatly increases kernel size.
Also if a kernel was IPL'ed from a shared segment the kexec system
call will not work.
You should only select this option if you know what you are
doing and want to exploit this feature.
config CMM
def_tristate n
prompt "Cooperative memory management"
help
Select this option, if you want to enable the kernel interface
to reduce the memory size of the system. This is accomplished
by allocating pages of memory and put them "on hold". This only
makes sense for a system running under VM where the unused pages
will be reused by VM for other guest systems. The interface
allows an external monitor to balance memory of many systems.
Everybody who wants to run Linux under VM should select this
option.
config CMM_IUCV
def_bool y
prompt "IUCV special message interface to cooperative memory management"
depends on CMM && (SMSGIUCV=y || CMM=SMSGIUCV)
help
Select this option to enable the special message interface to
the cooperative memory management.
config APPLDATA_BASE
def_bool n
prompt "Linux - VM Monitor Stream, base infrastructure"
depends on PROC_FS
help
This provides a kernel interface for creating and updating z/VM APPLDATA
monitor records. The monitor records are updated at certain time
intervals, once the timer is started.
Writing 1 or 0 to /proc/appldata/timer starts(1) or stops(0) the timer,
i.e. enables or disables monitoring on the Linux side.
A custom interval value (in seconds) can be written to
/proc/appldata/interval.
Defaults are 60 seconds interval and timer off.
The /proc entries can also be read from, showing the current settings.
config APPLDATA_MEM
def_tristate m
prompt "Monitor memory management statistics"
depends on APPLDATA_BASE && VM_EVENT_COUNTERS
help
This provides memory management related data to the Linux - VM Monitor
Stream, like paging/swapping rate, memory utilisation, etc.
Writing 1 or 0 to /proc/appldata/memory creates(1) or removes(0) a z/VM
APPLDATA monitor record, i.e. enables or disables monitoring this record
on the z/VM side.
Default is disabled.
The /proc entry can also be read from, showing the current settings.
This can also be compiled as a module, which will be called
appldata_mem.o.
config APPLDATA_OS
def_tristate m
prompt "Monitor OS statistics"
depends on APPLDATA_BASE
help
This provides OS related data to the Linux - VM Monitor Stream, like
CPU utilisation, etc.
Writing 1 or 0 to /proc/appldata/os creates(1) or removes(0) a z/VM
APPLDATA monitor record, i.e. enables or disables monitoring this record
on the z/VM side.
Default is disabled.
This can also be compiled as a module, which will be called
appldata_os.o.
config APPLDATA_NET_SUM
def_tristate m
prompt "Monitor overall network statistics"
depends on APPLDATA_BASE && NET
help
This provides network related data to the Linux - VM Monitor Stream,
currently there is only a total sum of network I/O statistics, no
per-interface data.
Writing 1 or 0 to /proc/appldata/net_sum creates(1) or removes(0) a z/VM
APPLDATA monitor record, i.e. enables or disables monitoring this record
on the z/VM side.
Default is disabled.
This can also be compiled as a module, which will be called
appldata_net_sum.o.
source kernel/Kconfig.hz
[PATCH] s390_hypfs filesystem On zSeries machines there exists an interface which allows the operating system to retrieve LPAR hypervisor accounting data. For example, it is possible to get usage data for physical and virtual cpus. In order to provide this information to user space programs, I implemented a new virtual Linux file system named 's390_hypfs' using the Linux 2.6 libfs framework. The name 's390_hypfs' stands for 'S390 Hypervisor Filesystem'. All the accounting information is put into different virtual files which can be accessed from user space. All data is represented as ASCII strings. When the file system is mounted the accounting information is retrieved and a file system tree is created with the attribute files containing the cpu information. The content of the files remains unchanged until a new update is made. An update can be triggered from user space through writing 'something' into a special purpose update file. We create the following directory structure: <mount-point>/ update cpus/ <cpu-id> type mgmtime <cpu-id> ... hyp/ type systems/ <lpar-name> cpus/ <cpu-id> type mgmtime cputime onlinetime <cpu-id> ... <lpar-name> cpus/ ... - update: File to trigger update - cpus/: Directory for all physical cpus - cpus/<cpu-id>/: Directory for one physical cpu. - cpus/<cpu-id>/type: Type name of physical zSeries cpu. - cpus/<cpu-id>/mgmtime: Physical-LPAR-management time in microseconds. - hyp/: Directory for hypervisor information - hyp/type: Typ of hypervisor (currently only 'LPAR Hypervisor') - systems/: Directory for all LPARs - systems/<lpar-name>/: Directory for one LPAR. - systems/<lpar-name>/cpus/<cpu-id>/: Directory for the virtual cpus - systems/<lpar-name>/cpus/<cpu-id>/type: Typ of cpu. - systems/<lpar-name>/cpus/<cpu-id>/mgmtime: Accumulated number of microseconds during which a physical CPU was assigned to the logical cpu and the cpu time was consumed by the hypervisor and was not provided to the LPAR (LPAR overhead). - systems/<lpar-name>/cpus/<cpu-id>/cputime: Accumulated number of microseconds during which a physical CPU was assigned to the logical cpu and the cpu time was consumed by the LPAR. - systems/<lpar-name>/cpus/<cpu-id>/onlinetime: Accumulated number of microseconds during which the logical CPU has been online. As mount point for the filesystem /sys/hypervisor/s390 is created. The update process is triggered when writing 'something' into the 'update' file at the top level hypfs directory. You can do this e.g. with 'echo 1 > update'. During the update the whole directory structure is deleted and built up again. Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Ingo Oeser <ioe-lkml@rameria.de> Cc: Joern Engel <joern@wohnheim.fh-wedel.de> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Michael Holzheu <holzheu@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 16:05:06 +07:00
config S390_HYPFS_FS
def_bool y
prompt "s390 hypervisor file system support"
[PATCH] s390_hypfs filesystem On zSeries machines there exists an interface which allows the operating system to retrieve LPAR hypervisor accounting data. For example, it is possible to get usage data for physical and virtual cpus. In order to provide this information to user space programs, I implemented a new virtual Linux file system named 's390_hypfs' using the Linux 2.6 libfs framework. The name 's390_hypfs' stands for 'S390 Hypervisor Filesystem'. All the accounting information is put into different virtual files which can be accessed from user space. All data is represented as ASCII strings. When the file system is mounted the accounting information is retrieved and a file system tree is created with the attribute files containing the cpu information. The content of the files remains unchanged until a new update is made. An update can be triggered from user space through writing 'something' into a special purpose update file. We create the following directory structure: <mount-point>/ update cpus/ <cpu-id> type mgmtime <cpu-id> ... hyp/ type systems/ <lpar-name> cpus/ <cpu-id> type mgmtime cputime onlinetime <cpu-id> ... <lpar-name> cpus/ ... - update: File to trigger update - cpus/: Directory for all physical cpus - cpus/<cpu-id>/: Directory for one physical cpu. - cpus/<cpu-id>/type: Type name of physical zSeries cpu. - cpus/<cpu-id>/mgmtime: Physical-LPAR-management time in microseconds. - hyp/: Directory for hypervisor information - hyp/type: Typ of hypervisor (currently only 'LPAR Hypervisor') - systems/: Directory for all LPARs - systems/<lpar-name>/: Directory for one LPAR. - systems/<lpar-name>/cpus/<cpu-id>/: Directory for the virtual cpus - systems/<lpar-name>/cpus/<cpu-id>/type: Typ of cpu. - systems/<lpar-name>/cpus/<cpu-id>/mgmtime: Accumulated number of microseconds during which a physical CPU was assigned to the logical cpu and the cpu time was consumed by the hypervisor and was not provided to the LPAR (LPAR overhead). - systems/<lpar-name>/cpus/<cpu-id>/cputime: Accumulated number of microseconds during which a physical CPU was assigned to the logical cpu and the cpu time was consumed by the LPAR. - systems/<lpar-name>/cpus/<cpu-id>/onlinetime: Accumulated number of microseconds during which the logical CPU has been online. As mount point for the filesystem /sys/hypervisor/s390 is created. The update process is triggered when writing 'something' into the 'update' file at the top level hypfs directory. You can do this e.g. with 'echo 1 > update'. During the update the whole directory structure is deleted and built up again. Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Ingo Oeser <ioe-lkml@rameria.de> Cc: Joern Engel <joern@wohnheim.fh-wedel.de> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Michael Holzheu <holzheu@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 16:05:06 +07:00
select SYS_HYPERVISOR
help
This is a virtual file system intended to provide accounting
information in an s390 hypervisor environment.
config KEXEC
def_bool n
prompt "kexec system call"
help
kexec is a system call that implements the ability to shutdown your
current kernel, and to start another kernel. It is like a reboot
but is independent of hardware/microcode support.
config ZFCPDUMP
def_bool n
prompt "zfcpdump support"
select SMP
help
Select this option if you want to build an zfcpdump enabled kernel.
Refer to <file:Documentation/s390/zfcpdump.txt> for more details on this.
config S390_GUEST
def_bool y
prompt "s390 guest support for KVM (EXPERIMENTAL)"
depends on 64BIT && EXPERIMENTAL
select VIRTIO
select VIRTIO_RING
select VIRTIO_CONSOLE
help
Select this option if you want to run the kernel as a guest under
the KVM hypervisor. This will add detection for KVM as well as a
virtio transport. If KVM is detected, the virtio console will be
the default console.
config SECCOMP
def_bool y
prompt "Enable seccomp to safely compute untrusted bytecode"
depends on PROC_FS
help
This kernel feature is useful for number crunching applications
that may need to compute untrusted bytecode during their
execution. By using pipes or other transports made available to
the process as file descriptors supporting the read/write
syscalls, it's possible to isolate those applications in
their own address space using seccomp. Once seccomp is
enabled via /proc/<pid>/seccomp, it cannot be disabled
and the task is only allowed to execute a few safe syscalls
defined by each seccomp mode.
If unsure, say Y.
endmenu
menu "Power Management"
source "kernel/power/Kconfig"
endmenu
source "net/Kconfig"
config PCMCIA
def_bool n
config CCW
def_bool y
source "drivers/Kconfig"
source "fs/Kconfig"
source "arch/s390/Kconfig.debug"
source "security/Kconfig"
source "crypto/Kconfig"
source "lib/Kconfig"
source "arch/s390/kvm/Kconfig"