With this patch we no longer reuse function tracer infrastructure, now
we register our own tracer back-end via a debugfs knob.
It's a bit more code, but that is the only downside. On the bright side we
have:
- Ability to make persistent_ram module removable (when needed, we can
move ftrace_ops struct into a module). Note that persistent_ram is still
not removable for other reasons, but with this patch it's just one
thing less to worry about;
- Pstore part is more isolated from the generic function tracer. We tried
it already by registering our own tracer in available_tracers, but that
way we're loosing ability to see the traces while we record them to
pstore. This solution is somewhere in the middle: we only register
"internal ftracer" back-end, but not the "front-end";
- When there is only pstore tracing enabled, the kernel will only write
to the pstore buffer, omitting function tracer buffer (which, of course,
still can be enabled via 'echo function > current_tracer').
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
With this support kernel can save function call chain log into a
persistent ram buffer that can be decoded and dumped after reboot
through pstore filesystem. It can be used to determine what function
was last called before a reset or panic.
We store the log in a binary format and then decode it at read time.
p.s.
Mostly the code comes from trace_persistent.c driver found in the
Android git tree, written by Colin Cross <ccross@android.com>
(according to sign-off history). I reworked the driver a little bit,
and ported it to pstore.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For function tracing we need to stop using pstore.buf directly, since
in a tracing callback we can't use spinlocks, and thus we can't safely
use the global buffer.
With write_buf callback, backends no longer need to access pstore.buf
directly, and thus we can pass any buffers (e.g. allocated on stack).
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This picks up the staging changes made in 3.5-rc4 so that everyone can sync up
properly.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Provide an iterator to receive the log buffer content, and convert all
kmsg_dump() users to it.
The structured data in the kmsg buffer now contains binary data, which
should no longer be copied verbatim to the kmsg_dump() users.
The iterator should provide reliable access to the buffer data, and also
supports proper log line-aware chunking of data while iterating.
Signed-off-by: Kay Sievers <kay@vrfy.org>
Tested-by: Tony Luck <tony.luck@intel.com>
Reported-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Tested-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Having automatic updates seems pointless for production system, and
even dangerous and thus counter-productive:
1. If we can mount pstore, or read files, we can as well read
/proc/kmsg. So, there's little point in duplicating the
functionality and present the same information but via another
userland ABI;
2. Expecting the kernel to behave sanely after oops/panic is naive.
It might work, but you'd rather not try it. Screwed up kernel
can do rather bad things, like recursive faults[1]; and pstore
rather provoking bad things to happen. It uses:
1. Timers (assumes sane interrupts state);
2. Workqueues and mutexes (assumes scheduler in a sane state);
3. kzalloc (a working slab allocator);
That's too much for a dead kernel, so the debugging facility
itself might just make debugging harder, which is not what
we want.
Maybe for non-oops message types it would make sense to re-enable
automatic updates, but so far I don't see any use case for this.
Even for tracing, it has its own run-time/normal ABI, so we're
only interested in pstore upon next boot, to retrieve what has
gone wrong with HW or SW.
So, let's disable the updates by default.
[1]
BUG: unable to handle kernel paging request at fffffffffffffff8
IP: [<ffffffff8104801b>] kthread_data+0xb/0x20
[...]
Process kworker/0:1 (pid: 14, threadinfo ffff8800072c0000, task ffff88000725b100)
[...
Call Trace:
[<ffffffff81043710>] wq_worker_sleeping+0x10/0xa0
[<ffffffff813687a8>] __schedule+0x568/0x7d0
[<ffffffff8106c24d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81087e22>] ? call_rcu_sched+0x12/0x20
[<ffffffff8102b596>] ? release_task+0x156/0x2d0
[<ffffffff8102b45e>] ? release_task+0x1e/0x2d0
[<ffffffff8106c24d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81368ac4>] schedule+0x24/0x70
[<ffffffff8102cba8>] do_exit+0x1f8/0x370
[<ffffffff810051e7>] oops_end+0x77/0xb0
[<ffffffff8135c301>] no_context+0x1a6/0x1b5
[<ffffffff8135c4de>] __bad_area_nosemaphore+0x1ce/0x1ed
[<ffffffff81053156>] ? ttwu_queue+0xc6/0xe0
[<ffffffff8135c50b>] bad_area_nosemaphore+0xe/0x10
[<ffffffff8101fa47>] do_page_fault+0x2c7/0x450
[<ffffffff8106e34b>] ? __lock_release+0x6b/0xe0
[<ffffffff8106bf21>] ? mark_held_locks+0x61/0x140
[<ffffffff810502fe>] ? __wake_up+0x4e/0x70
[<ffffffff81185f7d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff81158970>] ? pstore_register+0x120/0x120
[<ffffffff8136a37f>] page_fault+0x1f/0x30
[<ffffffff81158970>] ? pstore_register+0x120/0x120
[<ffffffff81185ab8>] ? memcpy+0x68/0x110
[<ffffffff8115875a>] ? pstore_get_records+0x3a/0x130
[<ffffffff811590f4>] ? persistent_ram_copy_old+0x64/0x90
[<ffffffff81158bf4>] ramoops_pstore_read+0x84/0x130
[<ffffffff81158799>] pstore_get_records+0x79/0x130
[<ffffffff81042536>] ? process_one_work+0x116/0x450
[<ffffffff81158970>] ? pstore_register+0x120/0x120
[<ffffffff8115897e>] pstore_dowork+0xe/0x10
[<ffffffff81042594>] process_one_work+0x174/0x450
[<ffffffff81042536>] ? process_one_work+0x116/0x450
[<ffffffff81042e13>] worker_thread+0x123/0x2d0
[<ffffffff81042cf0>] ? manage_workers.isra.28+0x120/0x120
[<ffffffff81047d8e>] kthread+0x8e/0xa0
[<ffffffff8136ba74>] kernel_thread_helper+0x4/0x10
[<ffffffff8136a199>] ? retint_restore_args+0xe/0xe
[<ffffffff81047d00>] ? __init_kthread_worker+0x70/0x70
[<ffffffff8136ba70>] ? gs_change+0xb/0xb
Code: be e2 00 00 00 48 c7 c7 d1 2a 4e 81 e8 bf fb fd ff 48 8b 5d f0 4c 8b 65 f8 c9 c3 0f 1f 44 00 00 48 8b 87 08 02 00 00 55 48 89 e5 <48> 8b 40 f8 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
RIP [<ffffffff8104801b>] kthread_data+0xb/0x20
RSP <ffff8800072c1888>
CR2: fffffffffffffff8
---[ end trace 996a332dc399111d ]---
Fixing recursive fault but reboot is needed!
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is no behavioural change, the default value is still 60 seconds.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pstore doesn't support logging kernel messages in run-time, it only
dumps dmesg when kernel oopses/panics. This makes pstore useless for
debugging hangs caused by HW issues or improper use of HW (e.g.
weird device inserted -> driver tried to write a reserved bits ->
SoC hanged. In that case we don't get any messages in the pstore.
Therefore, let's add a runtime logging support: PSTORE_TYPE_CONSOLE.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Colin Cross <ccross@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a pstore backend doesn't want to support various portions of the
pstore interface, it can just leave those functions NULL instead of
creating no-op stubs.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This allows a backend to filter on the dmesg reason as well as the pstore
reason. When ramoops is switched to pstore, this is needed since it has
no interest in storing non-crash dmesg details.
Drop pstore_write() as it has no users, and handling the "reason" here
has no obviously correct value.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The buf_lock cannot be held while populating the inodes, so make the backend
pass forward an allocated and filled buffer instead. This solves the following
backtrace. The effect is that "buf" is only ever used to notify the backends
that something was written to it, and shouldn't be used in the read path.
To replace the buf_lock during the read path, isolate the open/read/close
loop with a separate mutex to maintain serialized access to the backend.
Note that is is up to the pstore backend to cope if the (*write)() path is
called in the middle of the read path.
[ 59.691019] BUG: sleeping function called from invalid context at .../mm/slub.c:847
[ 59.691019] in_atomic(): 0, irqs_disabled(): 1, pid: 1819, name: mount
[ 59.691019] Pid: 1819, comm: mount Not tainted 3.0.8 #1
[ 59.691019] Call Trace:
[ 59.691019] [<810252d5>] __might_sleep+0xc3/0xca
[ 59.691019] [<810a26e6>] kmem_cache_alloc+0x32/0xf3
[ 59.691019] [<810b53ac>] ? __d_lookup_rcu+0x6f/0xf4
[ 59.691019] [<810b68b1>] alloc_inode+0x2a/0x64
[ 59.691019] [<810b6903>] new_inode+0x18/0x43
[ 59.691019] [<81142447>] pstore_get_inode.isra.1+0x11/0x98
[ 59.691019] [<81142623>] pstore_mkfile+0xae/0x26f
[ 59.691019] [<810a2a66>] ? kmem_cache_free+0x19/0xb1
[ 59.691019] [<8116c821>] ? ida_get_new_above+0x140/0x158
[ 59.691019] [<811708ea>] ? __init_rwsem+0x1e/0x2c
[ 59.691019] [<810b67e8>] ? inode_init_always+0x111/0x1b0
[ 59.691019] [<8102127e>] ? should_resched+0xd/0x27
[ 59.691019] [<8137977f>] ? _cond_resched+0xd/0x21
[ 59.691019] [<81142abf>] pstore_get_records+0x52/0xa7
[ 59.691019] [<8114254b>] pstore_fill_super+0x7d/0x91
[ 59.691019] [<810a7ff5>] mount_single+0x46/0x82
[ 59.691019] [<8114231a>] pstore_mount+0x15/0x17
[ 59.691019] [<811424ce>] ? pstore_get_inode.isra.1+0x98/0x98
[ 59.691019] [<810a8199>] mount_fs+0x5a/0x12d
[ 59.691019] [<810b9174>] ? alloc_vfsmnt+0xa4/0x14a
[ 59.691019] [<810b9474>] vfs_kern_mount+0x4f/0x7d
[ 59.691019] [<810b9d7e>] do_kern_mount+0x34/0xb2
[ 59.691019] [<810bb15f>] do_mount+0x5fc/0x64a
[ 59.691019] [<810912fb>] ? strndup_user+0x2e/0x3f
[ 59.691019] [<810bb3cb>] sys_mount+0x66/0x99
[ 59.691019] [<8137b537>] sysenter_do_call+0x12/0x26
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Currently pstore write interface employs record id as return
value, but it is not enough because it can't tell caller if
the write operation is successful. Pass the record id back via
an argument pointer and return zero for success, non-zero for
failure.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
pstore was using mutex locking to protect read/write access to the
backend plug-ins. This causes problems when pstore is executed in
an NMI context through panic() -> kmsg_dump().
This patch changes the mutex to a spin_lock_irqsave then also checks to
see if we are in an NMI context. If we are in an NMI and can't get the
lock, just print a message stating that and blow by the locking.
All this is probably a hack around the bigger locking problem but it
solves my current situation of trying to sleep in an NMI context.
Tested by loading the lkdtm module and executing a HARDLOCKUP which
will cause the machine to panic inside the nmi handler.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Life is simple for all the kernel terminating types of kmsg_dump
call backs - pstore just saves the tail end of the console log. But
for "oops" the situation is more complex - the kernel may carry on
running (possibly for ever). So we'd like to make the logged copy
of the oops appear in the pstore filesystem - so that the user has
a handle to clear the entry from the persistent backing store (if
we don't, the store may fill with "oops" entries (that are also
safely stashed in /var/log/messages) leaving no space for real
errors.
Current code calls pstore_mkfile() immediately. But this may
not be safe. The oops could have happened with arbitrary locks
held, or in interrupt or NMI context. So allocating memory and
calling into generic filesystem code seems unwise.
This patch defers making the entry appear. At the time
of the oops, we merely set a flag "pstore_new_entry" noting that
a new entry has been added. A periodic timer checks once a minute
to see if the flag is set - if so, it schedules a work queue to
rescan the backing store and make all new entries appear in the
pstore filesystem.
Signed-off-by: Tony Luck <tony.luck@intel.com>
pstore only allows one backend to be registered at present, but the
system may provide several. Add a parameter to allow the user to choose
which backend will be used rather than just relying on load order.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
We'll never have a negative part, so just make this an unsigned int.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
EFI only provides small amounts of individual storage, and conventionally
puts metadata in the storage variable name. Rather than add a metadata
header to the (already limited) variable storage, it's easier for us to
modify pstore to pass all the information we need to construct a unique
variable name to the appropriate functions.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Some pstore implementations may not have a static context, so extend the
API to pass the pstore_info struct to all calls and allow for a context
pointer.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Currently after mount/remount operation on pstore filesystem,
the content on pstore will be lost. It is because current ERST
implementation doesn't support multi-user usage, which moves
internal pointer to the end after accessing it. Adding
multi-user support for pstore usage.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
the return type of function _read_ in pstore is size_t,
but in the callback function of _read_, the logic doesn't
consider it too much, which means if negative value (assuming
error here) is returned, it will be converted to positive because
of type casting. ssize_t is enough for this function.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
pstore_dump() can be called with many different "reason" codes. Save
the name of the code in the persistent store record.
Also - only worthwhile calling pstore_mkfile for KMSG_DUMP_OOPS - that
is the only one where the kernel will continue running.
Reviewed-by: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
/sys/fs is a somewhat strange way to tweak what could more
obviously be tuned with a mount option.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some platforms have a small amount of non-volatile storage that
can be used to store information useful to diagnose the cause of
a system crash. This is the generic part of a file system interface
that presents information from the crash as a series of files in
/dev/pstore. Once the information has been seen, the underlying
storage is freed by deleting the files.
Signed-off-by: Tony Luck <tony.luck@intel.com>