2012-02-16 08:14:22 +07:00
|
|
|
/*
|
|
|
|
* Firmware Assisted dump header file.
|
|
|
|
*
|
|
|
|
* This program is free software; you can redistribute it and/or modify
|
|
|
|
* it under the terms of the GNU General Public License as published by
|
|
|
|
* the Free Software Foundation; either version 2 of the License, or
|
|
|
|
* (at your option) any later version.
|
|
|
|
*
|
|
|
|
* This program is distributed in the hope that it will be useful,
|
|
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
* GNU General Public License for more details.
|
|
|
|
*
|
|
|
|
* You should have received a copy of the GNU General Public License
|
|
|
|
* along with this program; if not, write to the Free Software
|
|
|
|
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
|
|
|
|
*
|
|
|
|
* Copyright 2011 IBM Corporation
|
|
|
|
* Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef __PPC64_FA_DUMP_H__
|
|
|
|
#define __PPC64_FA_DUMP_H__
|
|
|
|
|
|
|
|
#ifdef CONFIG_FA_DUMP
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The RMA region will be saved for later dumping when kernel crashes.
|
|
|
|
* RMA is Real Mode Area, the first block of logical memory address owned
|
|
|
|
* by logical partition, containing the storage that may be accessed with
|
|
|
|
* translate off.
|
|
|
|
*/
|
|
|
|
#define RMA_START 0x0
|
|
|
|
#define RMA_END (ppc64_rma_size)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* On some Power systems where RMO is 128MB, it still requires minimum of
|
|
|
|
* 256MB for kernel to boot successfully. When kdump infrastructure is
|
|
|
|
* configured to save vmcore over network, we run into OOM issue while
|
|
|
|
* loading modules related to network setup. Hence we need aditional 64M
|
|
|
|
* of memory to avoid OOM issue.
|
|
|
|
*/
|
|
|
|
#define MIN_BOOT_MEM (((RMA_END < (0x1UL << 28)) ? (0x1UL << 28) : RMA_END) \
|
|
|
|
+ (0x1UL << 26))
|
|
|
|
|
powerpc/fadump: Set an upper limit for boot memory size
By default, 5% of system RAM is reserved for preserving boot memory.
Alternatively, a user can specify the amount of memory to reserve.
See Documentation/powerpc/firmware-assisted-dump.txt for details. In
addition to the memory reserved for preserving boot memory, some more
memory is reserved, to save HPTE region, CPU state data and ELF core
headers.
Memory Reservation during first kernel looks like below:
Low memory Top of memory
0 boot memory size |
| | |<--Reserved dump area -->|
V V | Permanent Reservation V
+-----------+----------/ /----------+---+----+-----------+----+
| | |CPU|HPTE| DUMP |ELF |
+-----------+----------/ /----------+---+----+-----------+----+
| ^
| |
\ /
-------------------------------------------
Boot memory content gets transferred to
reserved area by firmware at the time of
crash
This implicitly means that the sum of the sizes of boot memory, CPU
state data, HPTE region, DUMP preserving area and ELF core headers
can't be greater than the total memory size. But currently, a user is
allowed to specify any value as boot memory size. So, the above rule
is violated when a boot memory size around 50% of the total available
memory is specified. As the kernel is not handling this currently, it
may lead to undefined behavior. Fix it by setting an upper limit for
boot memory size to 25% of the total available memory. Also, instead
of using memblock_end_of_DRAM(), which doesn't take the holes, if any,
in the memory layout into account, use memblock_phys_mem_size() to
calculate the percentage of total available memory.
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 14:30:27 +07:00
|
|
|
/* The upper limit percentage for user specified boot memory size (25%) */
|
|
|
|
#define MAX_BOOT_MEM_RATIO 4
|
|
|
|
|
2012-02-16 08:14:37 +07:00
|
|
|
#define memblock_num_regions(memblock_type) (memblock.memblock_type.cnt)
|
|
|
|
|
powerpc/fadump: Reservationless firmware assisted dump
One of the primary issues with Firmware Assisted Dump (fadump) on Power
is that it needs a large amount of memory to be reserved. On large
systems with TeraBytes of memory, this reservation can be quite
significant.
In some cases, fadump fails if the memory reserved is insufficient, or
if the reserved memory was DLPAR hot-removed.
In the normal case, post reboot, the preserved memory is filtered to
extract only relevant areas of interest using the makedumpfile tool.
While the tool provides flexibility to determine what needs to be part
of the dump and what memory to filter out, all supported distributions
default this to "Capture only kernel data and nothing else".
We take advantage of this default and the Linux kernel's Contiguous
Memory Allocator (CMA) to fundamentally change the memory reservation
model for fadump.
Instead of setting aside a significant chunk of memory nobody can use,
this patch uses CMA instead, to reserve a significant chunk of memory
that the kernel is prevented from using (due to MIGRATE_CMA), but
applications are free to use it. With this fadump will still be able
to capture all of the kernel memory and most of the user space memory
except the user pages that were present in CMA region.
Essentially, on a P9 LPAR with 2 cores, 8GB RAM and current upstream:
[root@zzxx-yy10 ~]# free -m
total used free shared buff/cache available
Mem: 7557 193 6822 12 541 6725
Swap: 4095 0 4095
With this patch:
[root@zzxx-yy10 ~]# free -m
total used free shared buff/cache available
Mem: 8133 194 7464 12 475 7338
Swap: 4095 0 4095
Changes made here are completely transparent to how fadump has
traditionally worked.
Thanks to Aneesh Kumar and Anshuman Khandual for helping us understand
CMA and its usage.
TODO:
- Handle case where CMA reservation spans nodes.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-20 15:17:17 +07:00
|
|
|
/* Alignement per CMA requirement. */
|
|
|
|
#define FADUMP_CMA_ALIGNMENT (PAGE_SIZE << \
|
|
|
|
max_t(unsigned long, MAX_ORDER - 1, pageblock_order))
|
|
|
|
|
2012-02-16 08:14:22 +07:00
|
|
|
/* Firmware provided dump sections */
|
|
|
|
#define FADUMP_CPU_STATE_DATA 0x0001
|
|
|
|
#define FADUMP_HPTE_REGION 0x0002
|
|
|
|
#define FADUMP_REAL_MODE_REGION 0x0011
|
|
|
|
|
2012-02-20 09:15:03 +07:00
|
|
|
/* Dump request flag */
|
|
|
|
#define FADUMP_REQUEST_FLAG 0x00000001
|
|
|
|
|
|
|
|
/* FAD commands */
|
|
|
|
#define FADUMP_REGISTER 1
|
|
|
|
#define FADUMP_UNREGISTER 2
|
|
|
|
#define FADUMP_INVALIDATE 3
|
|
|
|
|
2012-02-16 08:14:37 +07:00
|
|
|
/* Dump status flag */
|
|
|
|
#define FADUMP_ERROR_FLAG 0x2000
|
|
|
|
|
2012-02-16 08:14:45 +07:00
|
|
|
#define FADUMP_CPU_ID_MASK ((1UL << 32) - 1)
|
|
|
|
|
|
|
|
#define CPU_UNKNOWN (~((u32)0))
|
|
|
|
|
|
|
|
/* Utility macros */
|
2014-10-01 14:02:30 +07:00
|
|
|
#define SKIP_TO_NEXT_CPU(reg_entry) \
|
|
|
|
({ \
|
|
|
|
while (be64_to_cpu(reg_entry->reg_id) != REG_ID("CPUEND")) \
|
|
|
|
reg_entry++; \
|
|
|
|
reg_entry++; \
|
2012-02-16 08:14:45 +07:00
|
|
|
})
|
|
|
|
|
2017-05-09 05:56:24 +07:00
|
|
|
extern int crashing_cpu;
|
|
|
|
|
2012-02-20 09:15:03 +07:00
|
|
|
/* Kernel Dump section info */
|
|
|
|
struct fadump_section {
|
2014-10-01 14:02:30 +07:00
|
|
|
__be32 request_flag;
|
|
|
|
__be16 source_data_type;
|
|
|
|
__be16 error_flags;
|
|
|
|
__be64 source_address;
|
|
|
|
__be64 source_len;
|
|
|
|
__be64 bytes_dumped;
|
|
|
|
__be64 destination_address;
|
2012-02-20 09:15:03 +07:00
|
|
|
};
|
|
|
|
|
|
|
|
/* ibm,configure-kernel-dump header. */
|
|
|
|
struct fadump_section_header {
|
2014-10-01 14:02:30 +07:00
|
|
|
__be32 dump_format_version;
|
|
|
|
__be16 dump_num_sections;
|
|
|
|
__be16 dump_status_flag;
|
|
|
|
__be32 offset_first_dump_section;
|
2012-02-20 09:15:03 +07:00
|
|
|
|
|
|
|
/* Fields for disk dump option. */
|
2014-10-01 14:02:30 +07:00
|
|
|
__be32 dd_block_size;
|
|
|
|
__be64 dd_block_offset;
|
|
|
|
__be64 dd_num_blocks;
|
|
|
|
__be32 dd_offset_disk_path;
|
2012-02-20 09:15:03 +07:00
|
|
|
|
|
|
|
/* Maximum time allowed to prevent an automatic dump-reboot. */
|
2014-10-01 14:02:30 +07:00
|
|
|
__be32 max_time_auto;
|
2012-02-20 09:15:03 +07:00
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Firmware Assisted dump memory structure. This structure is required for
|
|
|
|
* registering future kernel dump with power firmware through rtas call.
|
|
|
|
*
|
|
|
|
* No disk dump option. Hence disk dump path string section is not included.
|
|
|
|
*/
|
|
|
|
struct fadump_mem_struct {
|
|
|
|
struct fadump_section_header header;
|
|
|
|
|
|
|
|
/* Kernel dump sections */
|
|
|
|
struct fadump_section cpu_state_data;
|
|
|
|
struct fadump_section hpte_region;
|
|
|
|
struct fadump_section rmr_region;
|
|
|
|
};
|
|
|
|
|
|
|
|
/* Firmware-assisted dump configuration details. */
|
2012-02-16 08:14:22 +07:00
|
|
|
struct fw_dump {
|
|
|
|
unsigned long cpu_state_data_size;
|
|
|
|
unsigned long hpte_region_size;
|
|
|
|
unsigned long boot_memory_size;
|
|
|
|
unsigned long reserve_dump_area_start;
|
|
|
|
unsigned long reserve_dump_area_size;
|
|
|
|
/* cmd line option during boot */
|
|
|
|
unsigned long reserve_bootvar;
|
|
|
|
|
2012-02-16 08:14:37 +07:00
|
|
|
unsigned long fadumphdr_addr;
|
2012-02-16 08:14:45 +07:00
|
|
|
unsigned long cpu_notes_buf;
|
|
|
|
unsigned long cpu_notes_buf_size;
|
|
|
|
|
2012-02-16 08:14:22 +07:00
|
|
|
int ibm_configure_kernel_dump;
|
|
|
|
|
|
|
|
unsigned long fadump_enabled:1;
|
|
|
|
unsigned long fadump_supported:1;
|
|
|
|
unsigned long dump_active:1;
|
2012-02-20 09:15:03 +07:00
|
|
|
unsigned long dump_registered:1;
|
powerpc/fadump: Reservationless firmware assisted dump
One of the primary issues with Firmware Assisted Dump (fadump) on Power
is that it needs a large amount of memory to be reserved. On large
systems with TeraBytes of memory, this reservation can be quite
significant.
In some cases, fadump fails if the memory reserved is insufficient, or
if the reserved memory was DLPAR hot-removed.
In the normal case, post reboot, the preserved memory is filtered to
extract only relevant areas of interest using the makedumpfile tool.
While the tool provides flexibility to determine what needs to be part
of the dump and what memory to filter out, all supported distributions
default this to "Capture only kernel data and nothing else".
We take advantage of this default and the Linux kernel's Contiguous
Memory Allocator (CMA) to fundamentally change the memory reservation
model for fadump.
Instead of setting aside a significant chunk of memory nobody can use,
this patch uses CMA instead, to reserve a significant chunk of memory
that the kernel is prevented from using (due to MIGRATE_CMA), but
applications are free to use it. With this fadump will still be able
to capture all of the kernel memory and most of the user space memory
except the user pages that were present in CMA region.
Essentially, on a P9 LPAR with 2 cores, 8GB RAM and current upstream:
[root@zzxx-yy10 ~]# free -m
total used free shared buff/cache available
Mem: 7557 193 6822 12 541 6725
Swap: 4095 0 4095
With this patch:
[root@zzxx-yy10 ~]# free -m
total used free shared buff/cache available
Mem: 8133 194 7464 12 475 7338
Swap: 4095 0 4095
Changes made here are completely transparent to how fadump has
traditionally worked.
Thanks to Aneesh Kumar and Anshuman Khandual for helping us understand
CMA and its usage.
TODO:
- Handle case where CMA reservation spans nodes.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-20 15:17:17 +07:00
|
|
|
unsigned long nocma:1;
|
2012-02-16 08:14:22 +07:00
|
|
|
};
|
|
|
|
|
2012-02-16 08:14:37 +07:00
|
|
|
/*
|
|
|
|
* Copy the ascii values for first 8 characters from a string into u64
|
|
|
|
* variable at their respective indexes.
|
|
|
|
* e.g.
|
|
|
|
* The string "FADMPINF" will be converted into 0x4641444d50494e46
|
|
|
|
*/
|
|
|
|
static inline u64 str_to_u64(const char *str)
|
|
|
|
{
|
|
|
|
u64 val = 0;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < sizeof(val); i++)
|
|
|
|
val = (*str) ? (val << 8) | *str++ : val << 8;
|
|
|
|
return val;
|
|
|
|
}
|
|
|
|
#define STR_TO_HEX(x) str_to_u64(x)
|
2012-02-16 08:14:45 +07:00
|
|
|
#define REG_ID(x) str_to_u64(x)
|
2012-02-16 08:14:37 +07:00
|
|
|
|
|
|
|
#define FADUMP_CRASH_INFO_MAGIC STR_TO_HEX("FADMPINF")
|
2012-02-16 08:14:45 +07:00
|
|
|
#define REGSAVE_AREA_MAGIC STR_TO_HEX("REGSAVE")
|
|
|
|
|
|
|
|
/* The firmware-assisted dump format.
|
|
|
|
*
|
|
|
|
* The register save area is an area in the partition's memory used to preserve
|
|
|
|
* the register contents (CPU state data) for the active CPUs during a firmware
|
|
|
|
* assisted dump. The dump format contains register save area header followed
|
|
|
|
* by register entries. Each list of registers for a CPU starts with
|
|
|
|
* "CPUSTRT" and ends with "CPUEND".
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* Register save area header. */
|
|
|
|
struct fadump_reg_save_area_header {
|
2014-10-01 14:02:30 +07:00
|
|
|
__be64 magic_number;
|
|
|
|
__be32 version;
|
|
|
|
__be32 num_cpu_offset;
|
2012-02-16 08:14:45 +07:00
|
|
|
};
|
|
|
|
|
|
|
|
/* Register entry. */
|
|
|
|
struct fadump_reg_entry {
|
2014-10-01 14:02:30 +07:00
|
|
|
__be64 reg_id;
|
|
|
|
__be64 reg_value;
|
2012-02-16 08:14:45 +07:00
|
|
|
};
|
2012-02-16 08:14:37 +07:00
|
|
|
|
|
|
|
/* fadump crash info structure */
|
|
|
|
struct fadump_crash_info_header {
|
|
|
|
u64 magic_number;
|
|
|
|
u64 elfcorehdr_addr;
|
2012-02-16 08:14:45 +07:00
|
|
|
u32 crashing_cpu;
|
|
|
|
struct pt_regs regs;
|
powerpc/fadump: rename cpu_online_mask member of struct fadump_crash_info_header
The four cpumasks cpu_{possible,online,present,active}_bits are exposed
readonly via the corresponding const variables cpu_xyz_mask. But they are
also accessible for arbitrary writing via the exposed functions
set_cpu_xyz. There's quite a bit of code throughout the kernel which
iterates over or otherwise accesses these bitmaps, and having the access
go via the cpu_xyz_mask variables is nowadays [1] simply a useless
indirection.
It may be that any problem in CS can be solved by an extra level of
indirection, but that doesn't mean every extra indirection solves a
problem. In this case, it even necessitates some minor ugliness (see
4/6).
Patch 1/6 is new in v2, and fixes a build failure on ppc by renaming a
struct member, to avoid problems when the identifier cpu_online_mask
becomes a macro later in the series. The next four patches eliminate the
cpu_xyz_mask variables by simply exposing the actual bitmaps, after
renaming them to discourage direct access - that still happens through
cpu_xyz_mask, which are now simply macros with the same type and value as
they used to have.
After that, there's no longer any reason to have the setter functions be
out-of-line: The boolean parameter is almost always a literal true or
false, so by making them static inlines they will usually compile to one
or two instructions.
For a defconfig build on x86_64, bloat-o-meter says we save ~3000 bytes.
We also save a little stack (stackdelta says 127 functions have a 16 byte
smaller stack frame, while two grow by that amount). Mostly because, when
iterating over the mask, gcc typically loads the value of cpu_xyz_mask
into a callee-saved register and from there into %rdi before each
find_next_bit call - now it can just load the appropriate immediate
address into %rdi before each call.
[1] See Rusty's kind explanation
http://thread.gmane.org/gmane.linux.kernel/2047078/focus=2047722 for
some historic context.
This patch (of 6):
As preparation for eliminating the indirect access to the various global
cpu_*_bits bitmaps via the pointer variables cpu_*_mask, rename the
cpu_online_mask member of struct fadump_crash_info_header to simply
online_mask, thus allowing cpu_online_mask to become a macro.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-21 06:00:13 +07:00
|
|
|
struct cpumask online_mask;
|
2012-02-16 08:14:37 +07:00
|
|
|
};
|
|
|
|
|
|
|
|
struct fad_crash_memory_ranges {
|
|
|
|
unsigned long long base;
|
|
|
|
unsigned long long size;
|
|
|
|
};
|
|
|
|
|
2018-08-20 15:17:32 +07:00
|
|
|
extern int is_fadump_memory_area(u64 addr, ulong size);
|
2012-02-16 08:14:22 +07:00
|
|
|
extern int early_init_dt_scan_fw_dump(unsigned long node,
|
|
|
|
const char *uname, int depth, void *data);
|
|
|
|
extern int fadump_reserve_mem(void);
|
2012-02-20 09:15:03 +07:00
|
|
|
extern int setup_fadump(void);
|
|
|
|
extern int is_fadump_active(void);
|
powerpc/powernv: Use kernel crash path for machine checks
There are quite a few machine check exceptions that can be caused by
kernel bugs. To make debugging easier, use the kernel crash path in
cases of synchronous machine checks that occur in kernel mode, if that
would not result in the machine going straight to panic or crash dump.
There is a downside here that die()ing the process in kernel mode can
still leave the system unstable. panic_on_oops will always force the
system to fail-stop, so systems where that behaviour is important will
still do the right thing.
As a test, when triggering an i-side 0111b error (ifetch from foreign
address) in kernel mode process context on POWER9, the kernel currently
dies quickly like this:
Severe Machine check interrupt [Not recovered]
NIP [ffff000000000000]: 0xffff000000000000
Initiator: CPU
Error type: Real address [Instruction fetch (foreign)]
[ 127.426651616,0] OPAL: Reboot requested due to Platform error.
Effective[ 127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000
opal: Reboot type 1 not supported
Kernel panic - not syncing: PowerNV Unrecovered Machine Check
CPU: 56 PID: 4425 Comm: syscall Tainted: G M 4.12.0-rc1-13857-ga4700a261072-dirty #35
Call Trace:
[ 128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to
buggy/mising code in OPAL for this BMC
Rebooting in 10 seconds..
Trying to free IRQ 496 from IRQ context!
After this patch, the process is killed and the kernel continues with
this message, which gives enough information to identify the offending
branch (i.e., with CFAR):
Severe Machine check interrupt [Not recovered]
NIP [ffff000000000000]: 0xffff000000000000
Initiator: CPU
Error type: Real address [Instruction fetch (foreign)]
Effective address: ffff000000000000
Oops: Machine check, sig: 7 [#1]
SMP NR_CPUS=2048
NUMA
PowerNV
Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 ...
CPU: 22 PID: 4436 Comm: syscall Tainted: G M 4.12.0-rc1-13857-ga4700a261072-dirty #36
task: c000000932300000 task.stack: c000000932380000
NIP: ffff000000000000 LR: 00000000217706a4 CTR: ffff000000000000
REGS: c00000000fc8fd80 TRAP: 0200 Tainted: G M (4.12.0-rc1-13857-ga4700a261072-dirty)
MSR: 90000000001c1003 <SF,HV,ME,RI,LE>
CR: 24000484 XER: 20000000
CFAR: c000000000004c80 DAR: 0000000021770a90 DSISR: 0a000000 SOFTE: 1
GPR00: 0000000000001ebe 00007fffce4818b0 0000000021797f00 0000000000000000
GPR04: 00007fff8007ac24 0000000044000484 0000000000004000 00007fff801405e8
GPR08: 900000000280f033 0000000024000484 0000000000000000 0000000000000030
GPR12: 9000000000001003 00007fff801bc370 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28: 00007fff801b0000 0000000000000000 00000000217707a0 00007fffce481918
NIP [ffff000000000000] 0xffff000000000000
LR [00000000217706a4] 0x217706a4
Call Trace:
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-19 13:59:11 +07:00
|
|
|
extern int should_fadump_crash(void);
|
2012-02-16 08:14:45 +07:00
|
|
|
extern void crash_fadump(struct pt_regs *, const char *);
|
2012-02-16 08:15:08 +07:00
|
|
|
extern void fadump_cleanup(void);
|
|
|
|
|
2012-02-20 09:15:03 +07:00
|
|
|
#else /* CONFIG_FA_DUMP */
|
|
|
|
static inline int is_fadump_active(void) { return 0; }
|
powerpc/powernv: Use kernel crash path for machine checks
There are quite a few machine check exceptions that can be caused by
kernel bugs. To make debugging easier, use the kernel crash path in
cases of synchronous machine checks that occur in kernel mode, if that
would not result in the machine going straight to panic or crash dump.
There is a downside here that die()ing the process in kernel mode can
still leave the system unstable. panic_on_oops will always force the
system to fail-stop, so systems where that behaviour is important will
still do the right thing.
As a test, when triggering an i-side 0111b error (ifetch from foreign
address) in kernel mode process context on POWER9, the kernel currently
dies quickly like this:
Severe Machine check interrupt [Not recovered]
NIP [ffff000000000000]: 0xffff000000000000
Initiator: CPU
Error type: Real address [Instruction fetch (foreign)]
[ 127.426651616,0] OPAL: Reboot requested due to Platform error.
Effective[ 127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000
opal: Reboot type 1 not supported
Kernel panic - not syncing: PowerNV Unrecovered Machine Check
CPU: 56 PID: 4425 Comm: syscall Tainted: G M 4.12.0-rc1-13857-ga4700a261072-dirty #35
Call Trace:
[ 128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to
buggy/mising code in OPAL for this BMC
Rebooting in 10 seconds..
Trying to free IRQ 496 from IRQ context!
After this patch, the process is killed and the kernel continues with
this message, which gives enough information to identify the offending
branch (i.e., with CFAR):
Severe Machine check interrupt [Not recovered]
NIP [ffff000000000000]: 0xffff000000000000
Initiator: CPU
Error type: Real address [Instruction fetch (foreign)]
Effective address: ffff000000000000
Oops: Machine check, sig: 7 [#1]
SMP NR_CPUS=2048
NUMA
PowerNV
Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 ...
CPU: 22 PID: 4436 Comm: syscall Tainted: G M 4.12.0-rc1-13857-ga4700a261072-dirty #36
task: c000000932300000 task.stack: c000000932380000
NIP: ffff000000000000 LR: 00000000217706a4 CTR: ffff000000000000
REGS: c00000000fc8fd80 TRAP: 0200 Tainted: G M (4.12.0-rc1-13857-ga4700a261072-dirty)
MSR: 90000000001c1003 <SF,HV,ME,RI,LE>
CR: 24000484 XER: 20000000
CFAR: c000000000004c80 DAR: 0000000021770a90 DSISR: 0a000000 SOFTE: 1
GPR00: 0000000000001ebe 00007fffce4818b0 0000000021797f00 0000000000000000
GPR04: 00007fff8007ac24 0000000044000484 0000000000004000 00007fff801405e8
GPR08: 900000000280f033 0000000024000484 0000000000000000 0000000000000030
GPR12: 9000000000001003 00007fff801bc370 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28: 00007fff801b0000 0000000000000000 00000000217707a0 00007fffce481918
NIP [ffff000000000000] 0xffff000000000000
LR [00000000217706a4] 0x217706a4
Call Trace:
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-19 13:59:11 +07:00
|
|
|
static inline int should_fadump_crash(void) { return 0; }
|
2012-02-16 08:14:45 +07:00
|
|
|
static inline void crash_fadump(struct pt_regs *regs, const char *str) { }
|
2012-02-16 08:14:22 +07:00
|
|
|
#endif
|
|
|
|
#endif
|