The function tracer can create a dynamically allocated trampoline that is
called by the function mcount or fentry hook that is used to call the
function callback that is registered. The problem is that the orc undwinder
will bail if it encounters one of these trampolines. This breaks the stack
trace of function callbacks, which include the stack tracer and setting the
stack trace for individual functions.
Since these dynamic trampolines are basically copies of the static ftrace
trampolines defined in ftrace_*.S, we do not need to create new orc entries
for the dynamic trampolines. Finding the return address on the stack will be
identical as the functions that were copied to create the dynamic
trampolines. When encountering a ftrace dynamic trampoline, we can just use
the orc entry of the ftrace static function that was copied for that
trampoline.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
- Now allow module init functions to be traced
- Clean up some unused or not used by config events (saves space)
- Clean up of trace histogram code
- Add support for preempt and interrupt enabled/disable events
- Other various clean ups
-----BEGIN PGP SIGNATURE-----
iQHIBAABCgAyFiEEPm6V/WuN2kyArTUe1a05Y9njSUkFAloPGgkUHHJvc3RlZHRA
Z29vZG1pcy5vcmcACgkQ1a05Y9njSUmfaAwAjge5FWBCBQeby8tVuw4RGAorRgl5
IFuijFSygcKRMhQFP6B+haHsezeCbNaBBtIncXhoJGDC5XuhUhr9foYf1SChEmYp
tCOK2o71FgZ8yG539IYCVjG9cJZxPLM0OI7RQ8hcMETAr+eiXPXxHrmrm9kdBtYM
ZAQERvqI5yu2HWIb87KBc38H0rgYrOJKZt9Rx20as/aqAME7hFvYErFlcnxdmHo+
LmovJOQBCTicNJ4TXJc418JaUWi9cm/A3uhW3o5aLMoRAxCc/8FD+dq2rg4qlHDH
tOtK6pwIPHfqRZ3nMLXXWhaa+w+swsxBOnegkvgP2xCyibKjFgh9kzcpaj41w3x1
0FCfvS7flx9ob//fAB8kxLvJyY5p3Qp3xdvj0+gp2qa3Ga5lSqcMzS419TLY1Yfa
Jpi2oAagDqP94m0EjAGTkhZMOrsFIDr49g3h7nqz3T3Z54luyXniDoYoO11d+dUF
vCUiIJz/PsQIE3NVViZiaRtcLVXneLHISmnz
=h3F2
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from
- allow module init functions to be traced
- clean up some unused or not used by config events (saves space)
- clean up of trace histogram code
- add support for preempt and interrupt enabled/disable events
- other various clean ups
* tag 'trace-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (30 commits)
tracing, thermal: Hide cpu cooling trace events when not in use
tracing, thermal: Hide devfreq trace events when not in use
ftrace: Kill FTRACE_OPS_FL_PER_CPU
perf/ftrace: Small cleanup
perf/ftrace: Fix function trace events
perf/ftrace: Revert ("perf/ftrace: Fix double traces of perf on ftrace:function")
tracing, dma-buf: Remove unused trace event dma_fence_annotate_wait_on
tracing, memcg, vmscan: Hide trace events when not in use
tracing/xen: Hide events that are not used when X86_PAE is not defined
tracing: mark trace_test_buffer as __maybe_unused
printk: Remove superfluous memory barriers from printk_safe
ftrace: Clear hashes of stale ips of init memory
tracing: Add support for preempt and irq enable/disable events
tracing: Prepare to add preempt and irq trace events
ftrace/kallsyms: Have /proc/kallsyms show saved mod init functions
ftrace: Add freeing algorithm to free ftrace_mod_maps
ftrace: Save module init functions kallsyms symbols for tracing
ftrace: Allow module init functions to be traced
ftrace: Add a ftrace_free_mem() function for modules to use
tracing: Reimplement log2
...
The one and only user of FTRACE_OPS_FL_PER_CPU is gone, remove the
lot.
Link: http://lkml.kernel.org/r/20171011080224.372422809@infradead.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Filters should be cleared of init functions during freeing of init
memory when the ftrace dyn records are released. However in current
code, the filters are left as is. This patch clears the hashes of the
saved init functions when the init memory is freed. This fixes the
following issue reproducible with the following sequence of commands for
a test module:
================================================
void bar(void)
{
printk(KERN_INFO "bar!\n");
}
void foo(void)
{
printk(KERN_INFO "foo!\n");
bar();
}
static int __init hello_init(void)
{
printk(KERN_INFO "Hello world!\n");
foo();
return 0;
}
static void __exit hello_cleanup(void)
{
printk(KERN_INFO "Cleaning up module.\n");
}
module_init(hello_init);
module_exit(hello_cleanup);
================================================
Commands:
echo '*:mod:test' > /d/tracing/set_ftrace_filter
echo function > /d/tracing/current_tracer
modprobe test
rmmod test
sleep 1
modprobe test
cat /d/tracing/set_ftrace_filter
Behavior without patch: Init function is still in the filter
Expected behavior: Shouldn't have any of the filters set
Link: http://lkml.kernel.org/r/20171009192931.56401-1-joelaf@google.com
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
If a module is loaded while tracing is enabled, then there's a possibility
that the module init functions were traced. These functions have their name
and address stored by ftrace such that it can translate the function address
that is written into the buffer into a human readable function name.
As userspace tools may be doing the same, they need a way to map function
names to their address as well. This is done through reading /proc/kallsyms.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The ftrace_mod_map is a descriptor to save module init function names in
case they were traced, and the trace output needs to reference the function
name from the function address. But after the function is unloaded, it
the maps should be freed, as the rest of the function names are as well.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Allow for module init sections to be traced as well as core kernel init
sections. Now that filtering modules functions can be stored, for when they
are loaded, it makes sense to be able to trace them.
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
In order to be able to trace module init functions, the module code needs to
tell ftrace what is being freed when the init sections are freed. Use the
code that the main init calls to tell ftrace to free the main init sections.
This requires passing in a start and end address to free.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The trampoline allocated by function tracer was overwriten by function_graph
tracer, and caused a memory leak. The save_global_trampoline should have
saved the previous trampoline in register_ftrace_graph() and restored it in
unregister_ftrace_graph(). But as it is implemented, save_global_trampoline was
only used in unregister_ftrace_graph as default value 0, and it overwrote the
previous trampoline's value. Causing the previous allocated trampoline to be
lost.
kmmeleak backtrace:
kmemleak_vmalloc+0x77/0xc0
__vmalloc_node_range+0x1b5/0x2c0
module_alloc+0x7c/0xd0
arch_ftrace_update_trampoline+0xb5/0x290
ftrace_startup+0x78/0x210
register_ftrace_function+0x8b/0xd0
function_trace_init+0x4f/0x80
tracing_set_tracer+0xe6/0x170
tracing_set_trace_write+0x90/0xd0
__vfs_write+0x37/0x170
vfs_write+0xb2/0x1b0
SyS_write+0x55/0xc0
do_syscall_64+0x67/0x180
return_from_SYSCALL_64+0x0/0x6a
[
Looking further into this, I found that this was left over from when the
function and function graph tracers shared the same ftrace_ops. But in
commit 5f151b2401 ("ftrace: Fix function_profiler and function tracer
together"), the two were separated, and the save_global_trampoline no
longer was necessary (and it may have been broken back then too).
-- Steven Rostedt
]
Link: http://lkml.kernel.org/r/20170912021454.5976-1-shuwang@redhat.com
Cc: stable@vger.kernel.org
Fixes: 5f151b2401 ("ftrace: Fix function_profiler and function tracer together")
Signed-off-by: Shu Wang <shuwang@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
If function tracing is disabled by the user via the function-trace option or
the proc sysctl file, and a ftrace_ops that was allocated on the heap is
unregistered, then the shutdown code exits out without doing the proper
clean up. This was found via kmemleak and running the ftrace selftests, as
one of the tests unregisters with function tracing disabled.
# cat kmemleak
unreferenced object 0xffffffffa0020000 (size 4096):
comm "swapper/0", pid 1, jiffies 4294668889 (age 569.209s)
hex dump (first 32 bytes):
55 ff 74 24 10 55 48 89 e5 ff 74 24 18 55 48 89 U.t$.UH...t$.UH.
e5 48 81 ec a8 00 00 00 48 89 44 24 50 48 89 4c .H......H.D$PH.L
backtrace:
[<ffffffff81d64665>] kmemleak_vmalloc+0x85/0xf0
[<ffffffff81355631>] __vmalloc_node_range+0x281/0x3e0
[<ffffffff8109697f>] module_alloc+0x4f/0x90
[<ffffffff81091170>] arch_ftrace_update_trampoline+0x160/0x420
[<ffffffff81249947>] ftrace_startup+0xe7/0x300
[<ffffffff81249bd2>] register_ftrace_function+0x72/0x90
[<ffffffff81263786>] trace_selftest_ops+0x204/0x397
[<ffffffff82bb8971>] trace_selftest_startup_function+0x394/0x624
[<ffffffff81263a75>] run_tracer_selftest+0x15c/0x1d7
[<ffffffff82bb83f1>] init_trace_selftests+0x75/0x192
[<ffffffff81002230>] do_one_initcall+0x90/0x1e2
[<ffffffff82b7d620>] kernel_init_freeable+0x350/0x3fe
[<ffffffff81d61ec3>] kernel_init+0x13/0x122
[<ffffffff81d72c6a>] ret_from_fork+0x2a/0x40
[<ffffffffffffffff>] 0xffffffffffffffff
Cc: stable@vger.kernel.org
Fixes: 12cce594fa ("ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
When a ftrace filter has a module function, and that module is removed, the
filter still has its address as being enabled. This can cause interesting
side effects. Nothing dangerous, but unwanted functions can be traced
because of it.
# cd /sys/kernel/tracing
# echo ':mod:snd_seq' > set_ftrace_filter
# cat set_ftrace_filter
snd_use_lock_sync_helper [snd_seq]
check_event_type_and_length [snd_seq]
snd_seq_ioctl_pversion [snd_seq]
snd_seq_ioctl_client_id [snd_seq]
snd_seq_ioctl_get_queue_tempo [snd_seq]
update_timestamp_of_queue [snd_seq]
snd_seq_ioctl_get_queue_status [snd_seq]
snd_seq_set_queue_tempo [snd_seq]
snd_seq_ioctl_set_queue_tempo [snd_seq]
snd_seq_ioctl_get_queue_timer [snd_seq]
seq_free_client1 [snd_seq]
[..]
# rmmod snd_seq
# cat set_ftrace_filter
# modprobe kvm
# cat set_ftrace_filter
kvm_set_cr4 [kvm]
kvm_emulate_hypercall [kvm]
kvm_set_dr [kvm]
This is because removing the snd_seq module after it was being filtered,
left the address of the snd_seq functions in the hash. When the kvm module
was loaded, some of its functions were loaded at the same address as the
snd_seq module. This would enable them to be filtered and traced.
Now we don't want to clear the hash completely. That would cause removing a
module where only its functions are filtered, to cause the tracing to enable
all functions, as an empty filter means to trace all functions. Instead,
just set the hash ip address to zero. Then it will never match any function.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
There's a small race when function graph shutsdown and the calling of the
registered function graph entry callback. The callback must not reference
the task's ret_stack without first checking that it is not NULL. Note, when
a ret_stack is allocated for a task, it stays allocated until the task exits.
The problem here, is that function_graph is shutdown, and a new task was
created, which doesn't have its ret_stack allocated. But since some of the
functions are still being traced, the callbacks can still be called.
The normal function_graph code handles this, but starting with commit
8861dd303c ("ftrace: Access ret_stack->subtime only in the function
profiler") the profiler code references the ret_stack on function entry, but
doesn't check if it is NULL first.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196611
Cc: stable@vger.kernel.org
Fixes: 8861dd303c ("ftrace: Access ret_stack->subtime only in the function profiler")
Reported-by: lilydjwg@gmail.com
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The variables which are processed by RCU functions should be annotated
as RCU, otherwise sparse will report the errors like below:
"error: incompatible types in comparison expression (different
address spaces)"
Link: http://lkml.kernel.org/r/1496823171-7758-1-git-send-email-zhang.chunyan@linaro.org
Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
[ Updated to not be 100% 80 column strict ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
My static checker complains that if "func" is NULL then "clear_filter"
is uninitialized. This seems like it could be true, although it's
possible something subtle is happening that I haven't seen.
kernel/trace/ftrace.c:3844 match_records()
error: uninitialized symbol 'clear_filter'.
Link: http://lkml.kernel.org/r/20170712073556.h6tkpjcdzjaozozs@mwanda
Cc: stable@vger.kernel.org
Fixes: f0a3b154bd ("ftrace: Clarify code for mod command")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
"func" can't be NULL and it doesn't make sense to check because we've
already derefenced it.
Link: http://lkml.kernel.org/r/20170712073340.4enzeojeoupuds5a@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
When modules are disabled, we get a harmless build warning:
kernel/trace/ftrace.c:4051:13: error: 'process_cached_mods' defined but not used [-Werror=unused-function]
This adds the same #ifdef around the new code that exists around
its caller.
Link: http://lkml.kernel.org/r/20170710084413.1820568-1-arnd@arndb.de
Fixes: d7fbf8df7c ("ftrace: Implement cached modules tracing on module load")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
As writing into stack_trace_filter, the iter-tr is not set and is NULL.
Check if it is NULL before dereferencing it in ftrace_regex_release().
Fixes: 8c08f0d5c6 ("ftrace: Have cached module filters be an active filter")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Need to get the changes from 0f17976568 ("ftrace: Fix regression with
module command in stack_trace_filter") as it is required to fix some other
changes with stack_trace_filter and the new development code.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
When doing the following command:
# echo ":mod:kvm_intel" > /sys/kernel/tracing/stack_trace_filter
it triggered a crash.
This happened with the clean up of probes. It required all callers to the
regex function (doing ftrace filtering) to have ops->private be a pointer to
a trace_array. But for the stack tracer, that is not the case.
Allow for the ops->private to be NULL, and change the function command
callbacks to handle the trace_array pointer being NULL as well.
Fixes: d2afd57a4b ("tracing/ftrace: Allow instances to have their own function probes")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Init boot up functions may be traced, but they are also freed when the
kernel finishes booting. These are removed from the ftrace tables, and the
debug variable for dyn_ftrace_total_info needs to reflect that as well.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
If the new_hash fails to allocate, then unlock the hash mutex on error.
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The dyn_ftrace_total_info file is used to show how many functions have been
converted into nops and can be used by ftrace. The problem is that it does
not get decremented when functions are removed (init boot code being freed,
and modules being freed). That means the number is very inaccurate everytime
functions are removed from the ftrace tables. Decrement it when functions
are removed.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
When a module filter is added to set_ftrace_filter, if the module is not
loaded, it is cached. This should be considered an active filter, and
function tracing should be filtered by this. That is, if a cached module
filter is the only filter set, then no function tracing should be happening,
as all the functions available will be filtered out.
This makes sense, as the reason to add a cached module filter, is to trace
the module when you load it. There shouldn't be any other tracing happening
until then.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
If a module is cached in the set_ftrace_filter, and that module is loaded,
then enable tracing on that module as if the cached module text was written
into set_ftrace_filter just as the module is loaded.
# echo ":mod:kvm_intel" >
# cat /sys/kernel/tracing/set_ftrace_filter
#### all functions enabled ####
:mod:kvm_intel
# modprobe kvm_intel
# cat /sys/kernel/tracing/set_ftrace_filter
vmx_get_rflags [kvm_intel]
vmx_get_pkru [kvm_intel]
vmx_get_interrupt_shadow [kvm_intel]
vmx_rdtscp_supported [kvm_intel]
vmx_invpcid_supported [kvm_intel]
[..]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
When writing in a module filter into set_ftrace_filter for a module that is
not yet loaded, it it cached, and will be executed when the module is loaded
(although that is not implemented yet at this commit). Display the list of
cached modules to be traced.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This is the start of the infrastructure work to allow for tracing module
functions before it is loaded.
Currently the following command:
# echo :mod:some-mod > set_ftrace_filter
will enable tracing of all functions within the module "some-mod" if it is
loaded. What we want, is if the module is not loaded, that line will be
saved. When the module is loaded, then the "some-mod" will have that line
executed on it, so that the functions within it starts being traced.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Dan Carpenter sent a patch to remove a check in ftrace_match_record()
because the logic of the code made the check redundant. I looked deeper into
the code, and made the following logic table, with the three variables and
the result of the original code.
modname mod_matches exclude_mod result
------- ----------- ----------- ------
0 0 0 return 0
0 0 1 func_match
0 1 * < cannot exist >
1 0 0 return 0
1 0 1 func_match
1 1 0 func_match
1 1 1 return 0
Notice that when mod_matches == exclude mod, the result is always to
return 0, and when mod_matches != exclude_mod, then the result is to test
the function. This means we only need test if mod_matches is equal to
exclude_mod.
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
We know that "mod_matches" is true here so there is no need to check
again.
Link: http://lkml.kernel.org/r/20170331152130.GA4947@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
o Pretty much a full rewrite of the processing of function plugins.
i.e. echo do_IRQ:stacktrace > set_ftrace_filter
o The rewrite was needed to add plugins to be unique to tracing instances.
i.e. mkdir instance/foo; cd instances/foo; echo do_IRQ:stacktrace > set_ftrace_filter
The old way was written very hacky. This removes a lot of those hacks.
o New "function-fork" tracing option. When set, pids in the set_ftrace_pid
will have their children added when the processes with their pids
listed in the set_ftrace_pid file forks.
o Exposure of "maxactive" for kretprobe in kprobe_events
o Allow for builtin init functions to be traced by the function tracer
(via the kernel command line). Module init function tracing will come
in the next release.
o Added more selftests, and have selftests also test in an instance.
-----BEGIN PGP SIGNATURE-----
iQExBAABCAAbBQJZCRchFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
zuIH/RsLUb8Hj6GmhAvn/tblUDzWyqlXX2h79VVlo/XrWayHYNHnKOmua1WwMZC6
xESXb/AffAc89VWTkKsrwaK7yfRPG6+w8zTZOcFuXSBpqSGG/oey9Fxj5Wqqpche
oJ2UY7ngxANAipkP5GxdYTafFSoWhGZGfUUtW+5tAHoFHzqO2lOjO8olbXP69sON
kVX/b461S20cVvRe5H/F0klXLSc37Tlp5YznXy4H4V4HcJSN1Fb6/uozOXALZ4se
SBpVMWmVVoGJorzj+ic7gVOeohvC8RnR400HbeMVwaI0Lj50noidDj/5Hv8F7T+D
h1B8vATNZLFAFUOSHINCBIu6Vj0=
=t8mg
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"New features for this release:
- Pretty much a full rewrite of the processing of function plugins.
i.e. echo do_IRQ:stacktrace > set_ftrace_filter
- The rewrite was needed to add plugins to be unique to tracing
instances. i.e. mkdir instance/foo; cd instances/foo; echo
do_IRQ:stacktrace > set_ftrace_filter The old way was written very
hacky. This removes a lot of those hacks.
- New "function-fork" tracing option. When set, pids in the
set_ftrace_pid will have their children added when the processes
with their pids listed in the set_ftrace_pid file forks.
- Exposure of "maxactive" for kretprobe in kprobe_events
- Allow for builtin init functions to be traced by the function
tracer (via the kernel command line). Module init function tracing
will come in the next release.
- Added more selftests, and have selftests also test in an instance"
* tag 'trace-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (60 commits)
ring-buffer: Return reader page back into existing ring buffer
selftests: ftrace: Allow some event trigger tests to run in an instance
selftests: ftrace: Have some basic tests run in a tracing instance too
selftests: ftrace: Have event tests also run in an tracing instance
selftests: ftrace: Make func_event_triggers and func_traceonoff_triggers tests do instances
selftests: ftrace: Allow some tests to be run in a tracing instance
tracing/ftrace: Allow for instances to trigger their own stacktrace probes
tracing/ftrace: Allow for the traceonoff probe be unique to instances
tracing/ftrace: Enable snapshot function trigger to work with instances
tracing/ftrace: Allow instances to have their own function probes
tracing/ftrace: Add a better way to pass data via the probe functions
ftrace: Dynamically create the probe ftrace_ops for the trace_array
tracing: Pass the trace_array into ftrace_probe_ops functions
tracing: Have the trace_array hold the list of registered func probes
ftrace: If the hash for a probe fails to update then free what was initialized
ftrace: Have the function probes call their own function
ftrace: Have each function probe use its own ftrace_ops
ftrace: Have unregister_ftrace_function_probe_func() return a value
ftrace: Add helper function ftrace_hash_move_and_update_ops()
ftrace: Remove data field from ftrace_func_probe structure
...
Pass around the local trace_array that is the descriptor for tracing
instances, when enabling and disabling probes. This by default sets the
enable/disable of event probe triggers to work with instances.
The other probes will need some more work to get them working with
instances.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
With the redesign of the registration and execution of the function probes
(triggers), data can now be passed from the setup of the probe to the probe
callers that are specific to the trace_array it is on. Although, all probes
still only affect the toplevel trace array, this change will allow for
instances to have their own probes separated from other instances and the
top array.
That is, something like the stacktrace probe can be set to trace only in an
instance and not the toplevel trace array. This isn't implement yet, but
this change sets the ground work for the change.
When a probe callback is triggered (someone writes the probe format into
set_ftrace_filter), it calls register_ftrace_function_probe() passing in
init_data that will be used to initialize the probe. Then for every matching
function, register_ftrace_function_probe() will call the probe_ops->init()
function with the init data that was passed to it, as well as an address to
a place holder that is associated with the probe and the instance. The first
occurrence will have a NULL in the pointer. The init() function will then
initialize it. If other probes are added, or more functions are part of the
probe, the place holder will be passed to the init() function with the place
holder data that it was initialized to the last time.
Then this place_holder is passed to each of the other probe_ops functions,
where it can be used in the function callback. When the probe_ops free()
function is called, it can be called either with the rip of the function
that is being removed from the probe, or zero, indicating that there are no
more functions attached to the probe, and the place holder is about to be
freed. This gives the probe_ops a way to free the data it assigned to the
place holder if it was allocade during the first init call.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
In order to eventually have each trace_array instance have its own unique
set of function probes (triggers), the trace array needs to hold the ops and
the filters for the probes.
This is the first step to accomplish this. Instead of having the private
data of the probe ops point to the trace_array, create a separate list that
the trace_array holds. There's only one private_data for a probe, we need
one per trace_array. The probe ftrace_ops will be dynamically created for
each instance, instead of being static.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Pass the trace_array associated to a ftrace_probe_ops into the probe_ops
func(), init() and free() functions. The trace_array is the descriptor that
describes a tracing instance. This will help create the infrastructure that
will allow having function probes unique to tracing instances.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Add a link list to the trace_array to hold func probes that are registered.
Currently, all function probes are the same for all instances as it was
before, that is, only the top level trace_array holds the function probes.
But this lays the ground work to have function probes be attached to
individual instances, and having the event trigger only affect events in the
given instance. But that work is still to be done.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
If the ftrace_hash_move_and_update_ops() fails, and an ops->free() function
exists, then it needs to be called on all the ops that were added by this
registration.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Now that the function probes have their own ftrace_ops, there's no reason to
continue using the ftrace_func_hash to find which probe to call in the
function callback. The ops that is passed in to the function callback is
part of the probe_ops to call.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Have the function probes have their own ftrace_ops, and remove the
trace_probe_ops. This simplifies some of the ftrace infrastructure code.
Individual entries for each function is still allocated for the use of the
output for set_ftrace_filter, but they will be removed soon too.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Currently unregister_ftrace_function_probe_func() is a void function. It
does not give any feedback if an error occurred or no item was found to
remove and nothing was done.
Change it to return status and success if it removed something. Also update
the callers to return that feedback to the user.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The processes of updating a ops filter_hash is a bit complex, and requires
setting up an old hash to perform the update. This is done exactly the same
in two locations for the same reasons. Create a helper function that does it
in one place.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
No users of the function probes uses the data field anymore. Remove it, and
change the init function to take a void *data parameter instead of a
void **data, because the init will just get the data that the registering
function was received, and there's no state after it is called.
The other functions for ftrace_probe_ops still take the data parameter, but
it will currently only be passed NULL. It will stay as a parameter for
future data to be passed to these functions.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
None of the probe users uses the data field anymore of the entry. They all
have their own print() function. Remove showing the data field in the
generic function as the data field will be going away.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
There are no users of unregister_ftrace_function_probe_all(). The only probe
function that is used is unregister_ftrace_function_probe_func(). Rename the
internal static function __unregister_ftrace_function_probe() to
unregister_ftrace_function_probe_func() and make it global.
Also remove the PROBE_TEST_FUNC as it would be always set.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Nothing calls unregister_ftrace_function_probe(). Remove it as well as the
flag PROBE_TEST_DATA, as this function was the only one to set it.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
In order to move the ops to the function probes directly, they need a way to
map function ips to their own data without depending on the infrastructure
of the function probes, as the data field will be going away.
New helper functions are added that are based on the ftrace_hash code.
ftrace_func_mapper functions are there to let the probes map ips to their
data. These can be allocated by the probe ops, and referenced in the
function callbacks.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
In preparation to cleaning up the probe function registration code, the
"data" parameter will eventually be removed from the probe->func() call.
Instead it will receive its own "ops" function, in which it can set up its
own data that it needs to map.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The function-fork option is same as event-fork that it tracks task
fork/exit and set the pid filter properly. This can be useful if user
wants to trace selected tasks including their children only.
Link: http://lkml.kernel.org/r/20170417024430.21194-3-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
When function tracer has a pid filter, it adds a probe to sched_switch
to track if current task can be ignored. The probe checks the
ftrace_ignore_pid from current tr to filter tasks. But it misses to
delete the probe when removing an instance so that it can cause a crash
due to the invalid tr pointer (use-after-free).
This is easily reproducible with the following:
# cd /sys/kernel/debug/tracing
# mkdir instances/buggy
# echo $$ > instances/buggy/set_ftrace_pid
# rmdir instances/buggy
============================================================================
BUG: KASAN: use-after-free in ftrace_filter_pid_sched_switch_probe+0x3d/0x90
Read of size 8 by task kworker/0:1/17
CPU: 0 PID: 17 Comm: kworker/0:1 Tainted: G B 4.11.0-rc3 #198
Call Trace:
dump_stack+0x68/0x9f
kasan_object_err+0x21/0x70
kasan_report.part.1+0x22b/0x500
? ftrace_filter_pid_sched_switch_probe+0x3d/0x90
kasan_report+0x25/0x30
__asan_load8+0x5e/0x70
ftrace_filter_pid_sched_switch_probe+0x3d/0x90
? fpid_start+0x130/0x130
__schedule+0x571/0xce0
...
To fix it, use ftrace_clear_pids() to unregister the probe. As
instance_rmdir() already updated ftrace codes, it can just free the
filter safely.
Link: http://lkml.kernel.org/r/20170417024430.21194-2-namhyung@kernel.org
Fixes: 0c8916c342 ("tracing: Add rmdir to remove multibuffer instances")
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
t_hash_start() does not increment *pos, where as t_next() must. But when
t_next() does increment *pos, it must still pass in the original *pos to
t_hash_start() otherwise it will skip the first instance:
# cd /sys/kernel/debug/tracing
# echo schedule:traceoff > set_ftrace_filter
# echo do_IRQ:traceoff > set_ftrace_filter
# echo call_rcu > set_ftrace_filter
# cat set_ftrace_filter
call_rcu
schedule:traceoff:unlimited
do_IRQ:traceoff:unlimited
The above called t_hash_start() from t_start() as there was only one
function (call_rcu), but if we add another function:
# echo xfrm_policy_destroy_rcu >> set_ftrace_filter
# cat set_ftrace_filter
call_rcu
xfrm_policy_destroy_rcu
do_IRQ:traceoff:unlimited
The "schedule:traceoff" disappears.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The function tracer needs to be more careful than other subsystems when it
comes to freeing data. Especially if that data is actually executable code.
When a single function is traced, a trampoline can be dynamically allocated
which is called to jump to the function trace callback. When the callback is
no longer needed, the dynamic allocated trampoline needs to be freed. This
is where the issues arise. The dynamically allocated trampoline must not be
used again. As function tracing can trace all subsystems, including
subsystems that are used to serialize aspects of freeing (namely RCU), it
must take extra care when doing the freeing.
Before synchronize_rcu_tasks() was around, there was no way for the function
tracer to know that nothing was using the dynamically allocated trampoline
when CONFIG_PREEMPT was enabled. That's because a task could be indefinitely
preempted while sitting on the trampoline. Now with synchronize_rcu_tasks(),
it will wait till all tasks have either voluntarily scheduled (not on the
trampoline) or goes into userspace (not on the trampoline). Then it is safe
to free the trampoline even with CONFIG_PREEMPT set.
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Relying on free_reserved_area() to call ftrace to free init memory proved to
not be sufficient. The issue is that on x86, when debug_pagealloc is
enabled, the init memory is not freed, but simply set as not present. Since
ftrace was uninformed of this, starting function tracing still tries to
update pages that are not present according to the page tables, causing
ftrace to bug, as well as killing the kernel itself.
Instead of relying on free_reserved_area(), have init/main.c call ftrace
directly just before it frees the init memory. Then it needs to use
__init_begin and __init_end to know where the init memory location is.
Looking at all archs (and testing what I can), it appears that this should
work for each of them.
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
I noticed that if I use dd to read the set_ftrace_filter file that the first
hash command is repeated.
# cd /sys/kernel/debug/tracing
# echo schedule > set_ftrace_filter
# echo do_IRQ >> set_ftrace_filter
# echo schedule:traceoff >> set_ftrace_filter
# echo do_IRQ:traceoff >> set_ftrace_filter
# cat set_ftrace_filter
schedule
do_IRQ
schedule:traceoff:unlimited
do_IRQ:traceoff:unlimited
# dd if=set_ftrace_filter bs=1
schedule
do_IRQ
schedule:traceoff:unlimited
schedule:traceoff:unlimited
do_IRQ:traceoff:unlimited
98+0 records in
98+0 records out
98 bytes copied, 0.00265011 s, 37.0 kB/s
This is due to the way t_start() calls t_next() as well as the seq_file
calls t_next() and the state is slightly different between the two. Namely,
t_start() will call t_next() with a local "pos" variable.
By separating out the function listing from t_next() into its own function,
we can have better control of outputting the functions and the hash of
triggers. This simplifies the code.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
If all functions are enabled, there's a comment displayed in the file to
denote that:
# cd /sys/kernel/debug/tracing
# cat set_ftrace_filter
#### all functions enabled ####
If a function trigger is set, those are displayed as well:
# echo schedule:traceoff >> /debug/tracing/set_ftrace_filter
# cat set_ftrace_filter
#### all functions enabled ####
schedule:traceoff:unlimited
But if you read that file with dd, the output can change:
# dd if=/debug/tracing/set_ftrace_filter bs=1
#### all functions enabled ####
32+0 records in
32+0 records out
32 bytes copied, 7.0237e-05 s, 456 kB/s
This is because the "pos" variable is updated for the comment, but func_pos
is not. "func_pos" is used by the triggers (or hashes) to know how many
functions were printed and it bases its index from the pos - func_pos.
func_pos should be 1 to count for the comment printed. But since it is not,
t_hash_start() thinks that one trigger was already printed.
The cat gets to t_hash_start() via t_next() and not t_start() which updates
both pos and func_pos.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The loop in t_start() of calling t_next() will call t_hash_start() if the
pos is beyond the functions and enters the hash items. There's no reason to
check if p is NULL and call t_hash_start(), as that would be redundant.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Instead of testing if the hash to use is the filter_hash or the notrace_hash
at each iteration, do the test at open, and set the iter->hash to point to
the corresponding filter or notrace hash. Then use that directly instead of
testing which hash needs to be used each iteration.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The return status check of __seq_open_private() is rather strange:
iter = __seq_open_private();
if (iter) {
/* do stuff */
}
return iter ? 0 : -ENOMEM;
It makes much more sense to do the return of failure right away:
iter = __seq_open_private();
if (!iter)
return -ENOMEM;
/* do stuff */
return 0;
This clean up will make updates to this code a bit nicer.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Adding a hook into free_reserve_area() that informs ftrace that boot up init
text is being free, lets ftrace safely remove those init functions from its
records, which keeps ftrace from trying to modify text that no longer
exists.
Note, this still does not allow for tracing .init text of modules, as
modules require different work for freeing its init code.
Link: http://lkml.kernel.org/r/1488502497.7212.24.camel@linux.intel.com
Cc: linux-mm@kvack.org
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Requested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
window. Namely powerpc broke as jump labels uses the two LSB bits as flags
in initialization. A check was added to make sure that all jump label
entries were 4 bytes aligned, but powerpc didn't work that way for modules.
Adding an alignment in the module linker script appeared to be the best
solution.
Jump labels also added an anonymous union to access those LSB bits as a
normal long. But because this structure had static initialization, it broke
older compilers that could not statically initialize anonymous unions
without brackets.
The command line parameter for setting function graph filter broke the
"EMPTY_HASH" descriptor by modifying it instead of creating a new hash to
hold the entries.
The command line parameter ftrace_graph_max_depth was added to allow its
setting at boot time. It uses existing code and only the command line hook
was added. This is not really a fix, but as it uses existing code without
affecting anything else, I added it to this release. It was ready before the
merge window closed, but I wanted to let it sit in linux-next for a couple
of days first.
-----BEGIN PGP SIGNATURE-----
iQExBAABCAAbBQJYvNrAFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
JGQIAMkayeZ0OCyYHRPR4EcCrdE3fATmt1huJWHrMPnT4/fLabL8XQqrOpnOBMq1
GFZb1SMkBmvGtAHF4GbvCxnIUfDQko6BTQAd8EMea1WM8+Kb66/BLgJawjWIU9I0
dNYre9ONgR2NOzkz6nfKRXnmy0lRcOweBb09YYGSzY11Md7d8T3T4TUrPNZdYrO9
8ZMbF4qRd9KLMRHcsWqvhWhBISxWnmtUSlthfweukKgDMy8OKpb7pR0ckjtYwsWX
RF41jqLqzSUqtd/nE2Sj/aT8XOP4pfrKEUuNM4SBj8q5jmNcZuqi8Q9wItu3LWR2
jqM/9UKTzaCr9cchwuvUC0i+jWc=
=kDql
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"There was some breakage with the changes for jump labels in the 4.11
merge window:
- powerpc broke as jump labels uses the two LSB bits as flags in
initialization.
A check was added to make sure that all jump label entries were 4
bytes aligned, but powerpc didn't work that way for modules. Adding
an alignment in the module linker script appeared to be the best
solution.
- Jump labels also added an anonymous union to access those LSB bits
as a normal long. But because this structure had static
initialization, it broke older compilers that could not statically
initialize anonymous unions without brackets.
- The command line parameter for setting function graph filter broke
the "EMPTY_HASH" descriptor by modifying it instead of creating a
new hash to hold the entries.
- The command line parameter ftrace_graph_max_depth was added to
allow its setting at boot time. It uses existing code and only the
command line hook was added.
This is not really a fix, but as it uses existing code without
affecting anything else, I added it to this release. It was ready
before the merge window closed, but I wanted to let it sit in
linux-next for a couple of days first"
* tag 'trace-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace/graph: Add ftrace_graph_max_depth kernel parameter
tracing: Add #undef to fix compile error
jump_label: Add comment about initialization order for anonymous unions
jump_label: Fix anonymous union initialization
module: set __jump_table alignment to 8
ftrace/graph: Do not modify the EMPTY_HASH for the function_graph filter
tracing: Fix code comment for ftrace_ops_get_func()
Early trace callgraphs can be extremely large on systems with
several seconds of boot time. The max_depth parameter limits how
deep the graph trace goes and reduces the output size. This
parameter is the same as the max_graph_depth file in tracefs.
Link: http://lkml.kernel.org/r/1488499935-23216-1-git-send-email-todd.e.brandt@linux.intel.com
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
[ changed comments about debugfs to tracefs ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
On boot up, if the kernel command line sets a graph funtion with the kernel
command line options "ftrace_graph_filter" or "ftrace_graph_notrace" then it
updates the corresponding function graph hash, ftrace_graph_hash or
ftrace_graph_notrace_hash respectively. Unfortunately, at boot up, these
variables are pointers to the "EMPTY_HASH" which is a constant used as a
placeholder when a hash has no entities. The problem was that the comand
line version to set the hashes updated the actual EMPTY_HASH instead of
creating a new hash for the function graph. This broke the EMPTY_HASH
because not only did it modify a constant (not sure how that was allowed to
happen, except maybe because it was done at early boot, const variables were
still mutable), but it made the filters have functions listed in them when
they were actually empty.
The kernel command line function needs to allocate a new hash for the
function graph filters and assign the necessary variables to that new hash
instead.
Link: http://lkml.kernel.org/r/1488420091.7212.17.camel@linux.intel.com
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: b9b0c831be ("ftrace: Convert graph filter to use hash tables")
Reported-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
We are going to split <linux/sched/task.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/task.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is no function 'ftrace_ops_recurs_func' existing in the current code,
it was renamed to ftrace_ops_assist_func() in commit c68c0fa293
("ftrace: Have ftrace_ops_get_func() handle RCU and PER_CPU flags too").
Update the comment to the correct function name.
Link: http://lkml.kernel.org/r/1487723366-14463-1-git-send-email-chuhu@redhat.com
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Currently, only one function can be written to set_graph_function and
set_graph_notrace. The last function in the list will have saved, even
though other functions will be added then removed.
Change the behavior to be the same as set_ftrace_function as to allow
multiple functions to be written. If any one fails, none of them will be
added. The addition of the functions are done at the end when the file is
closed.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The hashs ftrace_graph_hash and ftrace_graph_notrace_hash are modified
within the graph_lock being held. Holding a pointer to them and passing them
along can lead to a use of a stale pointer (fgd->hash). Move assigning the
pointer and its use to within the holding of the lock. Note, it's an
rcu_sched protected data, and other instances of referencing them are done
with preemption disabled. But the file manipuation code must be protected by
the lock.
The fgd->hash pointer is set to NULL when the lock is being released.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Since reading the set_graph_functions uses seq functions, which sets the
file->private_data pointer to a seq_file descriptor. On writes the
ftrace_graph_data descriptor is set to file->private_data. But if the file
is opened for RDWR, the ftrace_graph_write() will incorrectly use the
file->private_data descriptor instead of
((struct seq_file *)file->private_data)->private pointer, and this can crash
the kernel.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
When the set_graph_function or set_graph_notrace contains no records, a
banner is displayed of either "#### all functions enabled ####" or
"#### all functions disabled ####" respectively. To tell the seq operations
to do this, (void *)1 is passed as a return value. Instead of using a
hardcoded meaningless variable, define it as a macro.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This is a micro-optimization, but as it has to deal with a fast path of the
function tracer, these optimizations can be noticed.
The ftrace_lookup_ip() returns true if the given ip is found in the hash. If
it's not found or the hash is NULL, it returns false. But there's some cases
that a NULL hash is a true, and the ftrace_hash_empty() is tested before
calling ftrace_lookup_ip() in those cases. But as ftrace_lookup_ip() tests
that first, that adds a few extra unneeded instructions in those cases.
A new static "always_inlined" function is created that does not perform the
hash empty test. This most only be used by callers that do the check first
anyway, as an empty or NULL hash could cause a crash if a lookup is
performed on it.
Also add kernel doc for the ftrace_lookup_ip() main function.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Replace the couple of use cases that has small logic to produce the ftrace
function key id with a helper function. No need for duplicate code.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Use ftrace_hash instead of a static array of a fixed size. This is
useful when a graph filter pattern matches to a large number of
functions. Now hash lookup is done with preemption disabled to protect
from the hash being changed/freed.
Link: http://lkml.kernel.org/r/20170120024447.26097-3-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
It will be used when checking graph filter hashes later.
Link: http://lkml.kernel.org/r/20170120024447.26097-2-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
[ Moved ftrace_hash dec and functions outside of FUNCTION_GRAPH define ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The __ftrace_hash_move() is to allocates properly-sized hash and move
entries in the src ftrace_hash. It will be used to set function graph
filters which has nothing to do with the dyn_ftrace records.
Link: http://lkml.kernel.org/r/20170120024447.26097-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
There is no point in having an extra type for extra confusion. u64 is
unambiguous.
Conversion was done with the following coccinelle script:
@rem@
@@
-typedef u64 cycle_t;
@fix@
typedef cycle_t;
@@
-cycle_t
+u64
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
o STM can hook into the function tracer
o Function filtering now supports more advance glob matching
o Ftrace selftests updates and added tests
o Softirq tag in traces now show only softirqs
o ARM nop added to non traced locations at compile time
o New trace_marker_raw file that allows for binary input
o Optimizations to the ring buffer
o Removal of kmap in trace_marker
o Wakeup and irqsoff tracers now adhere to the set_graph_notrace file
o Other various fixes and clean ups
Note, there are two patches marked for stable. These were discovered
near the end of the 4.9 rc release cycle. By the time I had them tested
it was just a matter of days before 4.9 would be released, and I
figured I would just submit them in the merge window. They are old
bugs and not critical. Nothing non-root could abuse.
-----BEGIN PGP SIGNATURE-----
iQExBAABCAAbBQJYUrFHFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
2+AIAIr20kSQV/nA5htGAeCTobVk3WUxY6bvjd9mIJDKPP19akNLyREW0G3KnfCr
yhx4aFRZG98fRu/6F8qieRosyN36lADDVYHelMFHMpcTOpE2aZGjaaOuNGxOEA9v
FmMPTX+K3+dzKyFP4l68R3+5JuQ1/AqLTioTWeLW8IDQ2OOVsjD8+0BuXrNKMJDY
o6U4Hk5U/vn+zHc6BmgBzloAXemBd7iJ1t5V3FRRGvm8yv3HU85Twc5ofGeYTWvB
J8PboEywRlIzxg0Kd8mxnMI5PgaKZSEc2ub8E7cY/CZ5PYpDE2xDA2hJmJgfYp00
1VW+DHRpRZfElsCcya6S6P4bs5Y=
=MGZ/
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"This release has a few updates:
- STM can hook into the function tracer
- Function filtering now supports more advance glob matching
- Ftrace selftests updates and added tests
- Softirq tag in traces now show only softirqs
- ARM nop added to non traced locations at compile time
- New trace_marker_raw file that allows for binary input
- Optimizations to the ring buffer
- Removal of kmap in trace_marker
- Wakeup and irqsoff tracers now adhere to the set_graph_notrace file
- Other various fixes and clean ups"
* tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (42 commits)
selftests: ftrace: Shift down default message verbosity
kprobes/trace: Fix kprobe selftest for newer gcc
tracing/kprobes: Add a helper method to return number of probe hits
tracing/rb: Init the CPU mask on allocation
tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions too
fgraph: Handle a case where a tracer ignores set_graph_notrace
tracing: Replace kmap with copy_from_user() in trace_marker writing
ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it
tracing: Allow benchmark to be enabled at early_initcall()
tracing: Have system enable return error if one of the events fail
tracing: Do not start benchmark on boot up
tracing: Have the reg function allow to fail
ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inline
ring-buffer: Froce rb_update_write_stamp() to be inlined
ring-buffer: Force inline of hotpath helper functions
tracing: Make __buffer_unlock_commit() always_inline
tracing: Make tracepoint_printk a static_key
ring-buffer: Always inline rb_event_data()
ring-buffer: Make rb_reserve_next_event() always inlined
...
- Add additional checks for bad platform data
- Remove bounce buffer in console writer
- Protect read/unlink race with a mutex
- Correctly give up during dump locking failures
- Increase ftrace bandwidth by splitting ftrace buffers per CPU
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Kees Cook <kees@outflux.net>
iQIcBAABCgAGBQJYSJxYAAoJEIly9N/cbcAmYBsQAIAmHDgk3ootLQhyatZ9H2X0
Nyl24xA7UCPaz13ddF1tUaItI4mYBWfY4gde+3fIVXDitgmFxZZqb8YV68CvFgUt
Hb8tlTiM0F2z/muGBIgJ5TN5XiB4dO0WgvcKvnQdzyNGPVlAXvowHPkaM9X+iEA1
y4U2Le7iK9+9fvkH7RM4O3hMiTmpKeUITYTWo1Y8n9LaZo3w5+pqhS+TPu75uyD0
pLb53EOzZmg1nu9hcac5t4G5W1Lr4ji2EekDXemi/571HAzQnMXxJWc6ZVYLDNfP
W4D0UGcHAERDzrYwWcGn8HIThYlpbnVw9atSTTodJTiIubtsRt4haycUH1hqMS5o
4R2myhbAoM0A3zYBqrhwtQHg8apNes2hOR2WycAqgvylZZl1o6zaEs9zc7aafYuy
N/M0x5tlya3fOgkvkJsmERT5jtqDVMhtBZ2xa8NYfJCHgULaUmjEx25eTr1kF3nW
ERIX/3IayMvqHwYptP9dOzy2owLpXC8yZlM34AeM+ub93hHj1ELLfG7aN0bklD/+
wfmIX8HpOA2XGWflOk5fiHLHro6pwRU9zOIIHFJ4Tf60PMoN+rjRfej1fjz+KOhO
gxUYaCb+/4BlCqLqdFvF54qhQO2qmVuOAg/1BLu+hnZtXSyhVJxePthSs5shyoE8
owL8rVXDGapjF1xO6WCR
=UmFL
-----END PGP SIGNATURE-----
Merge tag 'pstore-v4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore updates from Kees Cook:
"Improvements and fixes to pstore subsystem:
- add additional checks for bad platform data
- remove bounce buffer in console writer
- protect read/unlink race with a mutex
- correctly give up during dump locking failures
- increase ftrace bandwidth by splitting ftrace buffers per CPU"
* tag 'pstore-v4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
ramoops: add pdata NULL check to ramoops_probe
pstore: Convert console write to use ->write_buf
pstore: Protect unlink with read_mutex
pstore: Use global ftrace filters for function trace filtering
ftrace: Provide API to use global filtering for ftrace ops
pstore: Clarify context field przs as dprzs
pstore: improve error report for failed setup
pstore: Merge per-CPU ftrace records into one
pstore: Add ftrace timestamp counter
ramoops: Split ftrace buffer space into per-CPU zones
pstore: Make ramoops_init_przs generic for other prz arrays
pstore: Allow prz to control need for locking
pstore: Warn on PSTORE_TYPE_PMSG using deprecated function
pstore: Make spinlock per zone instead of global
pstore: Actually give up during locking failure
Currently the global_ops filtering hash is not available to outside users
registering for function tracing. Provide an API for those users to be
able to choose global filtering.
This is in preparation for pstore's ftrace feature to be able to
use the global filters.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Use glob_match() to support flexible glob wildcards (*,?)
and character classes ([) for ftrace.
Since the full glob matching is slower than the current
partial matching routines(*pat, pat*, *pat*), this leaves
those routines and just add MATCH_GLOB for complex glob
expression.
e.g.
----
[root@localhost tracing]# echo 'sched*group' > set_ftrace_filter
[root@localhost tracing]# cat set_ftrace_filter
sched_free_group
sched_change_group
sched_create_group
sched_online_group
sched_destroy_group
sched_offline_group
[root@localhost tracing]# echo '[Ss]y[Ss]_*' > set_ftrace_filter
[root@localhost tracing]# head set_ftrace_filter
sys_arch_prctl
sys_rt_sigreturn
sys_ioperm
SyS_iopl
sys_modify_ldt
SyS_mmap
SyS_set_thread_area
SyS_get_thread_area
SyS_set_tid_address
sys_fork
----
Link: http://lkml.kernel.org/r/147566869501.29136.6462645009894738056.stgit@devbox
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When a module is first loaded and its function ip records are added to the
ftrace list of functions to modify, they are set to DISABLED, as their text
is still in a read only state. When the module is fully loaded, and can be
updated, the flag is cleared, and if their's any functions that should be
tracing them, it is updated at that moment.
But there's several locations that do record accounting and should ignore
records that are marked as disabled, or they can cause issues.
Alexei already fixed one location, but others need to be addressed.
Cc: stable@vger.kernel.org
Fixes: b7ffffbb46 "ftrace: Add infrastructure for delayed enabling of module functions"
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ftrace_shutdown() checks for sanity of ftrace records
and if dyn_ftrace->flags is not zero, it will warn.
It can happen that 'flags' are set to FTRACE_FL_DISABLED at this point,
since some module was loaded, but before ftrace_module_enable()
cleared the flags for this module.
In other words the module.c is doing:
ftrace_module_init(mod); // calls ftrace_update_code() that sets flags=FTRACE_FL_DISABLED
... // here ftrace_shutdown() is called that warns, since
err = prepare_coming_module(mod); // didn't have a chance to clear FTRACE_FL_DISABLED
Fix it by ignoring disabled records.
It's similar to what __ftrace_hash_rec_update() is already doing.
Link: http://lkml.kernel.org/r/1478560460-3818619-1-git-send-email-ast@fb.com
Cc: stable@vger.kernel.org
Fixes: b7ffffbb46 "ftrace: Add infrastructure for delayed enabling of module functions"
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The subtime is used only for function profiler with function graph
tracer enabled. Move the definition of subtime under
CONFIG_FUNCTION_PROFILER to reduce the memory usage. Also move the
initialization of subtime into the graph entry callback.
Link: http://lkml.kernel.org/r/20160831025529.24018-1-namhyung@kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Commit 345ddcc882 ("ftrace: Have set_ftrace_pid use the bitmap like events
do") placed ftrace_init_tracefs into the instance creation, and encapsulated
the top level updating with an if conditional, as the top level only gets
updated at boot up. Unfortunately, this triggers section mismatch errors as
the init functions are called from a function that can be called later, and
the section mismatch logic is unaware of the if conditional that would
prevent it from happening at run time.
To make everyone happy, create a separate ftrace_init_tracefs_toplevel()
routine that only gets called by init functions, and this will be what calls
other init functions for the toplevel directory.
Link: http://lkml.kernel.org/r/20160704102139.19cbc0d9@gandalf.local.home
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 345ddcc882 ("ftrace: Have set_ftrace_pid use the bitmap like events do")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Convert set_ftrace_pid to use the bitmap like set_event_pid does. This
allows for instances to use the pid filtering as well, and will allow for
function-fork option to set if the children of a traced function should be
traced or not.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1) I forgot that I had another selftest to stress test the ftrace
instance creation. It was actually suppose to go into the 4.6
merge window, but I never committed it. I almost forgot about it
again, but noticed it was missing from your tree.
2) Soumya PN sent me a clean up patch to not disable interrupts when
taking the tasklist_lock for read, as it's unnecessary because
that lock is never taken for write in irq context.
3) Newer gcc's can cause the jump in the function_graph code to the
global ftrace_stub label to be a short jump instead of a long one.
As that jump is dynamically converted to jump to the trace code to
do function graph tracing, and that conversion expects a long jump
it can corrupt the ftrace_stub itself (it's directly after that call).
One way to prevent gcc from using a short jump is to declare the
ftrace_stub as a weak function, which we do here to keep gcc from
optimizing too much.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXQhYQAAoJEKKk/i67LK/82pAH/3XzRCP366HqWnKdvluPB8vX
UnVoXGAX1Eh2ZpvlPIJBXNYOZlnGRMMMAoeI+su31FoJHrzTzfGXvRynTkZPFZtd
XakvHfACjtGtvi2MuCN1t9/d1ty/ob2o05KB9qc+JRlzHM09qTL/HX8hwZeEsMQ4
NYgEY4Y727LOSCrJieLktchpwtie77q8Wq25oiWIVWOyDjpCsPnZyaOqaQSANot9
Gd00cixbMam7Ba1BjoRsRQZaT2pYZ8vt7HDXDBfAOW1oOjalWARLhRg/zww1V3WD
DEptuEeyAgMJS3v76Z6Sbk/QM7hyGUWCcmC2qaN1yc2n1Sh+zBOiN1eyiiUh/2U=
=ERxv
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull motr tracing updates from Steven Rostedt:
"Three more changes.
- I forgot that I had another selftest to stress test the ftrace
instance creation. It was actually suppose to go into the 4.6
merge window, but I never committed it. I almost forgot about it
again, but noticed it was missing from your tree.
- Soumya PN sent me a clean up patch to not disable interrupts when
taking the tasklist_lock for read, as it's unnecessary because that
lock is never taken for write in irq context.
- Newer gcc's can cause the jump in the function_graph code to the
global ftrace_stub label to be a short jump instead of a long one.
As that jump is dynamically converted to jump to the trace code to
do function graph tracing, and that conversion expects a long jump
it can corrupt the ftrace_stub itself (it's directly after that
call). One way to prevent gcc from using a short jump is to
declare the ftrace_stub as a weak function, which we do here to
keep gcc from optimizing too much"
* tag 'trace-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace/x86: Set ftrace_stub to weak to prevent gcc from using short jumps to it
ftrace: Don't disable irqs when taking the tasklist_lock read_lock
ftracetest: Add instance created, delete, read and enable event test
In ftrace.c inside the function alloc_retstack_tasklist() (which will be
invoked when function_graph tracing is on) the tasklist_lock is being
held as reader while iterating through a list of threads. Here the lock
is being held as reader with irqs disabled. The tasklist_lock is never
write_locked in interrupt context so it is safe to not disable interrupts
for the duration of read_lock in this block which, can be significant,
given the block of code iterates through all threads. Hence changing the
code to call read_lock() and read_unlock() instead of read_lock_irqsave()
and read_unlock_irqrestore().
A similar change was made in commits: 8063e41d2f ("tracing: Change
syscall_*regfunc() to check PF_KTHREAD and use for_each_process_thread()")'
and 3472eaa1f1 ("sched: normalize_rt_tasks(): Don't use _irqsave for
tasklist_lock, use task_rq_lock()")'
Link: http://lkml.kernel.org/r/1463500874-77480-1-git-send-email-soumya.p.n@hpe.com
Signed-off-by: Soumya PN <soumya.p.n@hpe.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
In the ppc64 big endian ABI, function symbols point to function
descriptors. The symbols which point to the function entry points
have a dot in front of the function name. Consequently, when the
ftrace filter mechanism searches for the symbol corresponding to
an entry point address, it gets the dot symbol.
As a result, ftrace filter users have to be aware of this ABI detail on
ppc64 and prepend a dot to the function name when setting the filter.
The perf probe command insulates the user from this by ignoring the dot
in front of the symbol name when matching function names to symbols,
but the sysfs interface does not. This patch makes the ftrace filter
mechanism do the same when searching symbols.
Fixes the following failure in ftracetest's kprobe_ftrace.tc:
.../kprobe_ftrace.tc: line 9: echo: write error: Invalid argument
That failure is on this line of kprobe_ftrace.tc:
echo _do_fork > set_ftrace_filter
This is because there's no _do_fork entry in the functions list:
# cat available_filter_functions | grep _do_fork
._do_fork
This change introduces no regressions on the perf and ftracetest
testsuite results.
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In order to support live patching on powerpc we would like to call
ftrace_location_range(), so make it global.
Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Some visible changes:
A new flag was added to distinguish traces done in NMI context.
Preempt tracer now shows functions where preemption is disabled but
interrupts are still enabled.
Other notes:
Updates were done to function tracing to allow better performance
with perf.
Infrastructure code has been added to allow for a new histogram
feature for recording live trace event histograms that can be
configured by simple user commands. The feature itself was just
finished, but needs a round in linux-next before being pulled.
This only includes some infrastructure changes that will be needed.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJW8/WPAAoJEKKk/i67LK/8wrAH/j2gU9ZfjVxTu8068TBGWRJP
yvvzq0cK5evB3dsVuUmKKRfU52nSv4J1WcFF569X0RulSLylR0dHlcxFJMn4kkgR
bm0AHRrqOf87ub3VimcpG146iVQij37l5A0SRoFbvSPLQx1KUW18v99x41Ji8dv6
oWXRc6/YhdzEE7l0nUsVjmScQ4b2emsems3cxZzXOY+nRJsiim6i+VaDeatdyey1
csLVqtRCs+x62TVtxG3+GhcLdRoPRbnHAGzrKDFIn1SrQaRXCc54wN5d2hWxjgNI
1laOwaj070lnJiWfBLIP/K+lx+VKRx5/O0rKZX35foLUTqJJKSyjAbKXuMCcSAM=
=2h2K
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Nothing major this round. Mostly small clean ups and fixes.
Some visible changes:
- A new flag was added to distinguish traces done in NMI context.
- Preempt tracer now shows functions where preemption is disabled but
interrupts are still enabled.
Other notes:
- Updates were done to function tracing to allow better performance
with perf.
- Infrastructure code has been added to allow for a new histogram
feature for recording live trace event histograms that can be
configured by simple user commands. The feature itself was just
finished, but needs a round in linux-next before being pulled.
This only includes some infrastructure changes that will be needed"
* tag 'trace-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (22 commits)
tracing: Record and show NMI state
tracing: Fix trace_printk() to print when not using bprintk()
tracing: Remove redundant reset per-CPU buff in irqsoff tracer
x86: ftrace: Fix the misleading comment for arch/x86/kernel/ftrace.c
tracing: Fix crash from reading trace_pipe with sendfile
tracing: Have preempt(irqs)off trace preempt disabled functions
tracing: Fix return while holding a lock in register_tracer()
ftrace: Use kasprintf() in ftrace_profile_tracefs()
ftrace: Update dynamic ftrace calls only if necessary
ftrace: Make ftrace_hash_rec_enable return update bool
tracing: Fix typoes in code comment and printk in trace_nop.c
tracing, writeback: Replace cgroup path to cgroup ino
tracing: Use flags instead of bool in trigger structure
tracing: Add an unreg_all() callback to trigger commands
tracing: Add needs_rec flag to event triggers
tracing: Add a per-event-trigger 'paused' field
tracing: Add get_syscall_name()
tracing: Add event record param to trigger_ops.func()
tracing: Make event trigger functions available
tracing: Make ftrace_event_field checking functions available
...
Use the more common logging method with the eventual goal of removing
pr_warning altogether.
Miscellanea:
- Realign arguments
- Coalesce formats
- Add missing space between a few coalesced formats
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [kernel/power/suspend.c]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently dynamic ftrace calls are updated any time
the ftrace_ops is un/registered. If we do this update
only when it's needed, we save lot of time for perf
system wide ftrace function sampling/counting.
The reason is that for system wide sampling/counting,
perf creates event for each cpu in the system.
Each event then registers separate copy of ftrace_ops,
which ends up in FTRACE_UPDATE_CALLS updates. On servers
with many cpus that means serious stall (240 cpus server):
Counting:
# time ./perf stat -e ftrace:function -a sleep 1
Performance counter stats for 'system wide':
370,663 ftrace:function
1.401427505 seconds time elapsed
real 3m51.743s
user 0m0.023s
sys 3m48.569s
Sampling:
# time ./perf record -e ftrace:function -a sleep 1
[ perf record: Woken up 0 times to write data ]
Warning:
Processed 141200 events and lost 5 chunks!
[ perf record: Captured and wrote 10.703 MB perf.data (135950 samples) ]
real 2m31.429s
user 0m0.213s
sys 2m29.494s
There's no reason to do the FTRACE_UPDATE_CALLS update
for each event in perf case, because all the ftrace_ops
always share the same filter, so the updated calls are
always the same.
It's required that only first ftrace_ops registration
does the FTRACE_UPDATE_CALLS update (also sometimes
the second if the first one used the trampoline), but
the rest can be only cheaply linked into the ftrace_ops
list.
Counting:
# time ./perf stat -e ftrace:function -a sleep 1
Performance counter stats for 'system wide':
398,571 ftrace:function
1.377503733 seconds time elapsed
real 0m2.787s
user 0m0.005s
sys 0m1.883s
Sampling:
# time ./perf record -e ftrace:function -a sleep 1
[ perf record: Woken up 0 times to write data ]
Warning:
Processed 261730 events and lost 9 chunks!
[ perf record: Captured and wrote 19.907 MB perf.data (256293 samples) ]
real 1m31.948s
user 0m0.309s
sys 1m32.051s
Link: http://lkml.kernel.org/r/1458138873-1553-6-git-send-email-jolsa@kernel.org
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Change __ftrace_hash_rec_update to return true in case
we need to update dynamic ftrace call records. It return
false in case no update is needed.
Link: http://lkml.kernel.org/r/1458138873-1553-5-git-send-email-jolsa@kernel.org
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Remove the ftrace module notifier in favor of directly calling
ftrace_module_enable() and ftrace_release_mod() in the module loader.
Hard-coding the function calls directly in the module loader removes
dependence on the module notifier call chain and provides better
visibility and control over what gets called when, which is important
to kernel utilities such as livepatch.
This fixes a notifier ordering issue in which the ftrace module notifier
(and hence ftrace_module_enable()) for coming modules was being called
after klp_module_notify(), which caused livepatch modules to initialize
incorrectly. This patch removes dependence on the module notifier call
chain in favor of hard coding the corresponding function calls in the
module loader. This ensures that ftrace and livepatch code get called in
the correct order on patch module load and unload.
Fixes: 5156dca34a ("ftrace: Fix the race between ftrace and insmod")
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
We hit ftrace_bug report when booting Android on a 64bit ATOM SOC chip.
Basically, there is a race between insmod and ftrace_run_update_code.
After load_module=>ftrace_module_init, another thread jumps in to call
ftrace_run_update_code=>ftrace_arch_code_modify_prepare
=>set_all_modules_text_rw, to change all modules
as RW. Since the new module is at MODULE_STATE_UNFORMED, the text attribute
is not changed. Then, the 2nd thread goes ahead to change codes.
However, load_module continues to call complete_formation=>set_section_ro_nx,
then 2nd thread would fail when probing the module's TEXT.
The patch fixes it by using notifier to delay the enabling of ftrace
records to the time when module is at state MODULE_STATE_COMING.
Link: http://lkml.kernel.org/r/567CE628.3000609@intel.com
Signed-off-by: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Qiu Peiyang pointed out that there's a race when enabling function tracing
and loading a module. In order to make the modifications of converting nops
in the prologue of functions into callbacks, the text needs to be converted
from read-only to read-write. When enabling function tracing, the text
permission is updated, the functions are modified, and then they are put
back.
When loading a module, the updates to convert function calls to mcount is
done before the module text is set to read-only. But after it is done, the
module text is visible by the function tracer. Thus we have the following
race:
CPU 0 CPU 1
----- -----
start function tracing
set text to read-write
load_module
add functions to ftrace
set module text read-only
update all functions to callbacks
modify module functions too
< Can't it's read-only >
When this happens, ftrace detects the issue and disables itself till the
next reboot.
To fix this, a new DISABLED flag is added for ftrace records, which all
module functions get when they are added. Then later, after the module code
is all set, the records will have the DISABLED flag cleared, and they will
be enabled if any callback wants all functions to be traced.
Note, this doesn't add the delay to later. It simply changes the
ftrace_module_init() to do both the setting of DISABLED records, and then
immediately calls the enable code. This helps with testing this new code as
it has the same behavior as previously. Another change will come after this
to have the ftrace_module_enable() called after the text is set to
read-only.
Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The start and end variables were only used when ftrace_module_init() was
split up into multiple functions. No need to keep them around after the
merger.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Jiri Olsa noted that the change to replace the control_ops did not update
the trampoline for when running perf on a single CPU and with CONFIG_PREEMPT
disabled (where dynamic ops, like perf, can use trampolines directly). The
result was that perf function could be called when RCU is not watching as
well as not handle the ftrace_local_disable().
Modify the ftrace_ops_get_func() to also check the RCU and PER_CPU ops flags
and use the recursive function if they are set. The recursive function is
modified to check those flags and execute the appropriate checks if they are
set.
Link: http://lkml.kernel.org/r/20151201134213.GA14155@krava.brq.redhat.com
Reported-by: Jiri Olsa <jolsa@redhat.com>
Patch-fixed-up-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently perf has its own list function within the ftrace infrastructure
that seems to be used only to allow for it to have per-cpu disabling as well
as a check to make sure that it's not called while RCU is not watching. It
uses something called the "control_ops" which is used to iterate over ops
under it with the control_list_func().
The problem is that this control_ops and control_list_func unnecessarily
complicates the code. By replacing FTRACE_OPS_FL_CONTROL with two new flags
(FTRACE_OPS_FL_RCU and FTRACE_OPS_FL_PER_CPU) we can remove all the code
that is special with the control ops and add the needed checks within the
generic ftrace_list_func().
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When showing all tramps registered to a ftrace record in the file
enabled_functions, it exits the loop with ops == NULL. But then it is
suppose to show the function on the ops->trampoline and
add_trampoline_func() is called with the given ops. But because ops is now
NULL (to exit the loop), it always shows the static trampoline instead of
the one that is really registered to the record.
The call to add_trampoline_func() that shows the trampoline for the given
ops needs to be called at every iteration.
Fixes: 39daa7b9e8 "ftrace: Show all tramps registered to a record on ftrace_bug()"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When an anomaly is detected in the function call modification code,
ftrace_bug() is called to disable function tracing as well as give any
information that may help debug the problem. Currently, only the first found
trampoline that is attached to the failed record is reported. Instead, show
all trampolines that are hooked to it.
Also, not only show the ops pointer but also report the function it calls.
While at it, add this info to the enabled_functions debug file too.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When an anomaly is found while modifying function code, ftrace_bug() is
called which disables the function tracing infrastructure and reports
information about what failed. If the code that is to be replaced does not
match what is expected, then actual code is shown. Currently there is no
arch generic way to show what was expected.
Add a new variable pointer calld ftrace_expected that the arch code can set
to point to what it expected so that ftrace_bug() can report the actual text
as well as the text that was expected to be there.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The ftrace function hook utility has several internal checks to make sure
that whatever it modifies is exactly what it expects to be modifying. This
is essential as modifying running code can be extremely dangerous to the
system.
When an anomaly is detected, ftrace_bug() is called which sends a splat to
the console and disables function tracing. There's some extra information
that is printed to help diagnose the issue.
One thing that is missing though is output of what ftrace was doing at the
time of the crash. Was it updating a call site or perhaps converting a call
site to a nop? A new global enum variable is created to state what ftrace
was doing at the time of the anomaly, and this is reported in ftrace_bug().
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
stable tags to them. I searched through my INBOX just as the merge window
opened and found lots of patches to pull. I ran them through all my tests
and they were in linux-next for a few days.
Features added this release:
----------------------------
o Module globbing. You can now filter function tracing to several
modules. # echo '*:mod:*snd*' > set_ftrace_filter (Dmitry Safonov)
o Tracer specific options are now visible even when the tracer is not
active. It was rather annoying that you can only see and modify tracer
options after enabling the tracer. Now they are in the options/ directory
even when the tracer is not active. Although they are still only visible
when the tracer is active in the trace_options file.
o Trace options are now per instance (although some of the tracer specific
options are global)
o New tracefs file: set_event_pid. If any pid is added to this file, then
all events in the instance will filter out events that are not part of
this pid. sched_switch and sched_wakeup events handle next and the wakee
pids.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWPLQ5AAoJEKKk/i67LK/8CTYIAI1u8DE5QCzv3J0p54jVpNVR
J5FqEU3eXIzd6FS4JXD4nxCeMpUZAy21YnhlZpsnrbJJM5bc9bUsBCwiKKM+MuSZ
ztmy2sgYKkO0h/KUdhNgYJrzis3/Ojquyx9iAqK5ST/Fr+nKYx81akFKjNK53iur
RJRut45sSa8rv11LaL8sgJ6hAWQTc+YkybUdZ5xaMdJmZ6A61T7Y6VzTjbUexuvL
hntCfTjYLtVd8dbfknAnf3B7n/VOO3IFF85wr7ciYR5oEVfPrF8tHmJBlhHExPpX
kaXAiDDRY/UTg/5DQqnp4zmxJoR5BQ2l4pT5PwiLcnwhcphIDNYS8EYUmOYAWjU=
=TjOE
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracking updates from Steven Rostedt:
"Most of the changes are clean ups and small fixes. Some of them have
stable tags to them. I searched through my INBOX just as the merge
window opened and found lots of patches to pull. I ran them through
all my tests and they were in linux-next for a few days.
Features added this release:
----------------------------
- Module globbing. You can now filter function tracing to several
modules. # echo '*:mod:*snd*' > set_ftrace_filter (Dmitry Safonov)
- Tracer specific options are now visible even when the tracer is not
active. It was rather annoying that you can only see and modify
tracer options after enabling the tracer. Now they are in the
options/ directory even when the tracer is not active. Although
they are still only visible when the tracer is active in the
trace_options file.
- Trace options are now per instance (although some of the tracer
specific options are global)
- New tracefs file: set_event_pid. If any pid is added to this file,
then all events in the instance will filter out events that are not
part of this pid. sched_switch and sched_wakeup events handle next
and the wakee pids"
* tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits)
tracefs: Fix refcount imbalance in start_creating()
tracing: Put back comma for empty fields in boot string parsing
tracing: Apply tracer specific options from kernel command line.
tracing: Add some documentation about set_event_pid
ring_buffer: Remove unneeded smp_wmb() before wakeup of reader benchmark
tracing: Allow dumping traces without tracking trace started cpus
ring_buffer: Fix more races when terminating the producer in the benchmark
ring_buffer: Do no not complete benchmark reader too early
tracing: Remove redundant TP_ARGS redefining
tracing: Rename max_stack_lock to stack_trace_max_lock
tracing: Allow arch-specific stack tracer
recordmcount: arm64: Replace the ignored mcount call into nop
recordmcount: Fix endianness handling bug for nop_mcount
tracepoints: Fix documentation of RCU lockdep checks
tracing: ftrace_event_is_function() can return boolean
tracing: is_legal_op() can return boolean
ring-buffer: rb_event_is_commit() can return boolean
ring-buffer: rb_per_cpu_empty() can return boolean
ring_buffer: ring_buffer_empty{cpu}() can return boolean
ring-buffer: rb_is_reader_page() can return boolean
...
Extend module command for function filter selection with globbing.
It uses the same globbing as function filter.
sh# echo '*alloc*:mod:*' > set_ftrace_filter
Will trace any function with the letters 'alloc' in the name in any
module but not in kernel.
sh# echo '!*alloc*:mod:ipv6' >> set_ftrace_filter
Will prevent from tracing functions with 'alloc' in the name from module
ipv6 (do not forget to append to set_ftrace_filter file).
sh# echo '*alloc*:mod:!ipv6' > set_ftrace_filter
Will trace functions with 'alloc' in the name from kernel and any
module except ipv6.
sh# echo '*alloc*:mod:!*' > set_ftrace_filter
Will trace any function with the letters 'alloc' in the name only from
kernel, but not from any module.
sh# echo '*:mod:!*' > set_ftrace_filter
or
sh# echo ':mod:!' > set_ftrace_filter
Will trace every function in the kernel, but will not trace functions
from any module.
sh# echo '*:mod:*' > set_ftrace_filter
or
sh# echo ':mod:' > set_ftrace_filter
As the opposite will trace all functions from all modules, but not from
kernel.
sh# echo '*:mod:*snd*' > set_ftrace_filter
Will trace your sound drivers only (if any).
Link: http://lkml.kernel.org/r/1443545176-3215-4-git-send-email-0x7f454c46@gmail.com
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
[ Made format changes ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ftrace_match parameters are very related and I reduce the number of local
variables & parameters with it.
This is also preparation for module globbing as it would introduce more
realated variables & parameters.
Link: http://lkml.kernel.org/r/1443545176-3215-3-git-send-email-0x7f454c46@gmail.com
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
[ Made some formatting changes ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
"Not" is too abstract variable name - changed to clear_filter.
Removed ftrace_match_module_records function: comparison with !* or *
not does the general code in filter_parse_regex() as it works without
mod command for
sh# echo '!*' > /sys/kernel/debug/tracing/set_ftrace_filter
Link: http://lkml.kernel.org/r/1443545176-3215-2-git-send-email-0x7f454c46@gmail.com
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
By now there isn't any subcommand for mod.
Before:
sh$ echo '*:mod:ipv6:a' > set_ftrace_filter
sh$ echo '*:mod:ipv6' > set_ftrace_filter
had the same results, but now first will result in:
sh$ echo '*:mod:ipv6:a' > set_ftrace_filter
-bash: echo: write error: Invalid argument
Also, I clarified ftrace_mod_callback code a little.
Link: http://lkml.kernel.org/r/1443545176-3215-1-git-send-email-0x7f454c46@gmail.com
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
[ converted 'if (ret == 0)' to 'if (!ret)' ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
__trace_sched_switch_state() is the last remaining PREEMPT_ACTIVE
user, move trace_sched_switch() from prepare_task_switch() to
__schedule() and propagate the @preempt argument.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
To cover the common case of sorting an array of pointers, Daniel
Wagner recently modified the library sort() to use a specific swap
function for size==8, in addition to the size==4 case which was
already handled. Since sizeof(long) is either 4 or 8,
ftrace_swap_ips() is redundant and we can just let sort() pick an
appropriate and fast swap callback.
Link: http://lkml.kernel.org/r/1441834023-13130-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The sleep-time and graph-time options are only for the function graph tracer
and are not used by anything else. As tracer options are now visible when
the tracer is not activated, its better to move the function graph specific
tracer options into the function graph tracer.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The changes with more meat are:
o Allowing the trace event filters to filter on CPU number and process ids
o Two new markers for trace output latency were added
(10 and 100 msec latencies)
o Have tracing_thresh filter function profiling time
I also worked on modifying the ring buffer code for some future
work, and moved the adding of the timestamp around. One of my changes
caused a regression, and since other changes were built on top of it
and already tested, I had to operate a revert of that change. Instead
of rebasing, this change set has the code that caused a regression
as well as the code to revert that change without touching the other
changes that were made on top of it.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJV6aZEAAoJEEjnJuOKh9ldrR4H/A1RcQf1prLLoUibPP4w3lat
dmQcdpS1NY+cqyiKuKPAOkFDGQL7qWzRqZ8whcPSJIsHq57ufqNSLf+0bbQYPzg9
g3CgGL7OApmGi5ulj0sNxhadvc9TFm/SAN0nVJlNuUWdm8e1UWHLsrJZaMfopu2r
RDEtkOhg619mhDL4rktNdS6rk0B92Fhu2o2PwLZPVlUl1NNEt4WJU+ejitXUVO1A
Nb70/rTGGJKtyHbW+74on4LnEN5Uu0Viu6rMwGfYyIgRmC2otdBDvE4xfKMiTUKr
SzBjzrhIoMIRn4Vl0vElfulkpYaw7pcC2BdpZ4d9VpIOiLSlZs0x/TgCtpFEv5M=
=baZ3
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing update from Steven Rostedt:
"Mostly this is just clean ups and micro optimizations.
The changes with more meat are:
- Allowing the trace event filters to filter on CPU number and
process ids
- Two new markers for trace output latency were added (10 and 100
msec latencies)
- Have tracing_thresh filter function profiling time
I also worked on modifying the ring buffer code for some future work,
and moved the adding of the timestamp around. One of my changes
caused a regression, and since other changes were built on top of it
and already tested, I had to operate a revert of that change. Instead
of rebasing, this change set has the code that caused a regression as
well as the code to revert that change without touching the other
changes that were made on top of it"
* tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Revert "ring-buffer: Get timestamp after event is allocated"
tracing: Don't make assumptions about length of string on task rename
tracing: Allow triggers to filter for CPU ids and process names
ftrace: Format MCOUNT_ADDR address as type unsigned long
tracing: Introduce two additional marks for delay
ftrace: Fix function_graph duration spacing with 7-digits
ftrace: add tracing_thresh to function profile
tracing: Clean up stack tracing and fix fentry updates
ring-buffer: Reorganize function locations
ring-buffer: Make sure event has enough room for extend and padding
ring-buffer: Get timestamp after event is allocated
ring-buffer: Move the adding of the extended timestamp out of line
ring-buffer: Add event descriptor to simplify passing data
ftrace: correct the counter increment for trace_buffer data
tracing: Fix for non-continuous cpu ids
tracing: Prefer kcalloc over kzalloc with multiply
Commit 4104d326b6 ("ftrace: Remove global function list and call function
directly") simplified the ftrace code by removing the global_ops list with a
new design. But this cleanup also broke the filtering of PIDs that are added
to the set_ftrace_pid file.
Add back the proper hooks to have pid filtering working once again.
Cc: stable@vger.kernel.org # 3.16+
Reported-by: Matt Fleming <matt@console-pimps.org>
Reported-by: Richard Weinberger <richard.weinberger@gmail.com>
Tested-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This patch extends tracing_thresh functionality to function profile tracer.
If tracing_thresh is set, print those entries only,
whose average is > tracing thresh.
Link: http://lkml.kernel.org/r/1434972488-8571-1-git-send-email-umesh.t@samsung.com
Signed-off-by: Umesh Tiwari <umesh.t@samsung.com>
[ Removed unnecessary 'moved' comment ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints.
Tracepoints have helper functions for the TP_printk() called
__print_symbolic() and __print_flags() that lets a numeric number be
displayed as a a human comprehensible text. What is placed in the
TP_printk() is also shown in the tracepoint format file such that
user space tools like perf and trace-cmd can parse the binary data
and express the values too. Unfortunately, the way the TRACE_EVENT()
macro works, anything placed in the TP_printk() will be shown pretty
much exactly as is. The problem arises when enums are used. That's
because unlike macros, enums will not be changed into their values
by the C pre-processor. Thus, the enum string is exported to the
format file, and this makes it useless for user space tools.
The TRACE_DEFINE_ENUM() solves this by converting the enum strings
in the TP_printk() format into their number, and that is what is
shown to user space. For example, the tracepoint tlb_flush currently
has this in its format file:
__print_symbolic(REC->reason,
{ TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
{ TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
{ TLB_LOCAL_SHOOTDOWN, "local shootdown" },
{ TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })
After adding:
TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);
Its format file will contain this:
__print_symbolic(REC->reason,
{ 0, "flush on task switch" },
{ 1, "remote shootdown" },
{ 2, "local shootdown" },
{ 3, "local mm shootdown" })
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJVLBTuAAoJEEjnJuOKh9ldjHMIALdRS755TXCZGOf0r7O2akOR
wMPeum7C+ae1mH+jCsJKUC0/jUfQKaMt/UxoHlipDgcGg8kD2jtGnGCw4Xlwvdsr
y4rFmcTRSl1mo0zDSsg6ujoupHlVYN0+JPjrd7S3cv/llJoY49zcanNLF7S2XLeM
dZCtWRLWYpBiWO68ai6AqJTnE/eGFIqBI048qb5Eg8dbK243SSeSIf9Ywhb+VsA+
aq6F7cWI/H6j4tbeza8tAN19dcwenDro5EfCDY8ARQHJu1f6Y3+DLf2imjkd6Aiu
JVAoGIjHIpI+djwCZC1u4gi4urjfOqYartrM3Q54tb3YWYqHeNqP2ASI2a4EpYk=
=Ixwt
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Some clean ups and small fixes, but the biggest change is the addition
of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints.
Tracepoints have helper functions for the TP_printk() called
__print_symbolic() and __print_flags() that lets a numeric number be
displayed as a a human comprehensible text. What is placed in the
TP_printk() is also shown in the tracepoint format file such that user
space tools like perf and trace-cmd can parse the binary data and
express the values too. Unfortunately, the way the TRACE_EVENT()
macro works, anything placed in the TP_printk() will be shown pretty
much exactly as is. The problem arises when enums are used. That's
because unlike macros, enums will not be changed into their values by
the C pre-processor. Thus, the enum string is exported to the format
file, and this makes it useless for user space tools.
The TRACE_DEFINE_ENUM() solves this by converting the enum strings in
the TP_printk() format into their number, and that is what is shown to
user space. For example, the tracepoint tlb_flush currently has this
in its format file:
__print_symbolic(REC->reason,
{ TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
{ TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
{ TLB_LOCAL_SHOOTDOWN, "local shootdown" },
{ TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })
After adding:
TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);
Its format file will contain this:
__print_symbolic(REC->reason,
{ 0, "flush on task switch" },
{ 1, "remote shootdown" },
{ 2, "local shootdown" },
{ 3, "local mm shootdown" })"
* tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits)
tracing: Add enum_map file to show enums that have been mapped
writeback: Export enums used by tracepoint to user space
v4l: Export enums used by tracepoints to user space
SUNRPC: Export enums in tracepoints to user space
mm: tracing: Export enums in tracepoints to user space
irq/tracing: Export enums in tracepoints to user space
f2fs: Export the enums in the tracepoints to userspace
net/9p/tracing: Export enums in tracepoints to userspace
x86/tlb/trace: Export enums in used by tlb_flush tracepoint
tracing/samples: Update the trace-event-sample.h with TRACE_DEFINE_ENUM()
tracing: Allow for modules to convert their enums to values
tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values
tracing: Update trace-event-sample with TRACE_SYSTEM_VAR documentation
tracing: Give system name a pointer
brcmsmac: Move each system tracepoints to their own header
iwlwifi: Move each system tracepoints to their own header
mac80211: Move message tracepoints to their own header
tracing: Add TRACE_SYSTEM_VAR to xhci-hcd
tracing: Add TRACE_SYSTEM_VAR to kvm-s390
tracing: Add TRACE_SYSTEM_VAR to intel-sst
...
more than one release, as I had it ready for the 4.0 merge window, but
a last minute thing that needed to go into Linux first had to be done.
That was that perf hard coded the file system number when reading
/sys/kernel/debugfs/tracing directory making sure that the path had
the debugfs mount # before it would parse the tracing file. This broke
other use cases of perf, and the check is removed.
Now when mounting /sys/kernel/debug, tracefs is automatically mounted
in /sys/kernel/debug/tracing such that old tools will still see that
path as expected. But now system admins can mount tracefs directly
and not need to mount debugfs, which can expose security issues.
A new directory is created when tracefs is configured such that
system admins can now mount it separately (/sys/kernel/tracing).
This branch is based off of Al Viro's vfs debugfs_automount branch
at commit 163f9eb95a
debugfs: Provide a file creation function that also takes an initial size
to get the debugfs_create_automount() operation.
I just noticed that Al rebased the pull to add his Signed-off-by to
that commit, and the commit is now e59b4e9187.
I did a git diff of those two and see they are the same. Only the
latter has Al's SOB.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJVLA6YAAoJEEjnJuOKh9ldv6AH/1JUINDQwV+M0VTwzbLogloo
Sco0byLhskmx5KLVD7Vs8BJAGrgHTdit32kzBGmLGJvVCKBa+c8lwmRw6rnXg3uX
K4kGp7BIyn1/geXoGpCmDKaLGXhDcw49hRzejKDg/OqFtxKTsSeQtG8fo29ps9Do
0VaF6UDp8gYplC2N2BfpB59LVndrITQ3mSsBBeFPvS7IxFJXAhDBOq2yi0aI6HyJ
ICo2L/bA9HLxMuceWrXbsun+RP68+AQlnFfAtok7AcuBzUYPCKY0shT2VMOUtpTt
1dGxMxq6q1ACfmY7gbp47WMX9aKjWcSEr0V+IYx/xex6Maf0Xsujsy99bKYUWvs=
=OcgU
-----END PGP SIGNATURE-----
Merge tag 'trace-4.1-tracefs' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracefs from Steven Rostedt:
"This adds the new tracefs file system.
This has been in linux-next for more than one release, as I had it
ready for the 4.0 merge window, but a last minute thing that needed to
go into Linux first had to be done. That was that perf hard coded the
file system number when reading /sys/kernel/debugfs/tracing directory
making sure that the path had the debugfs mount # before it would
parse the tracing file. This broke other use cases of perf, and the
check is removed.
Now when mounting /sys/kernel/debug, tracefs is automatically mounted
in /sys/kernel/debug/tracing such that old tools will still see that
path as expected. But now system admins can mount tracefs directly
and not need to mount debugfs, which can expose security issues. A
new directory is created when tracefs is configured such that system
admins can now mount it separately (/sys/kernel/tracing)"
* tag 'trace-4.1-tracefs' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Have mkdir and rmdir be part of tracefs
tracefs: Add directory /sys/kernel/tracing
tracing: Automatically mount tracefs on debugfs/tracing
tracing: Convert the tracing facility over to use tracefs
tracefs: Add new tracefs file system
tracing: Create cmdline tracer options on tracing fs init
tracing: Only create tracer options files if directory exists
debugfs: Provide a file creation function that also takes an initial size
Dynamically allocated trampolines call ftrace_ops_get_func to get the
function which they should call. For dynamic fops (FTRACE_OPS_FL_DYNAMIC
flag is set) ftrace_ops_list_func is always returned. This is reasonable
for static trampolines but goes against the main advantage of dynamic
ones, that is avoidance of going through the list of all registered
callbacks for functions that are only being traced by a single callback.
We can fix it by returning ops->func (or recursion safe version) from
ftrace_ops_get_func whenever it is possible for dynamic trampolines.
Note that dynamic trampolines are not allowed for dynamic fops if
CONFIG_PREEMPT=y.
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1501291023000.25445@pobox.suse.cz
Link: http://lkml.kernel.org/r/1424357773-13536-1-git-send-email-mbenes@suse.cz
Reported-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Some archs (specifically PowerPC), are sensitive with the ordering of
the enabling of the calls to function tracing and setting of the
function to use to be traced.
That is, update_ftrace_function() sets what function the ftrace_caller
trampoline should call. Some archs require this to be set before
calling ftrace_run_update_code().
Another bug was discovered, that ftrace_startup_sysctl() called
ftrace_run_update_code() directly. If the function the ftrace_caller
trampoline changes, then it will not be updated. Instead a call
to ftrace_startup_enable() should be called because it tests to see
if the callback changed since the code was disabled, and will
tell the arch to update appropriately. Most archs do not need this
notification, but PowerPC does.
The problem could be seen by the following commands:
# echo 0 > /proc/sys/kernel/ftrace_enabled
# echo function > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /proc/sys/kernel/ftrace_enabled
# cat /sys/kernel/debug/tracing/trace
The trace will show that function tracing was not active.
Cc: stable@vger.kernel.org # 2.6.27+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When ftrace is enabled globally through the proc interface, we must check if
ftrace_graph_active is set. If it is set, then we should also pass the
FTRACE_START_FUNC_RET command to ftrace_run_update_code(). Similarly, when
ftrace is disabled globally through the proc interface, we must check if
ftrace_graph_active is set. If it is set, then we should also pass the
FTRACE_STOP_FUNC_RET command to ftrace_run_update_code().
Consider the following situation.
# echo 0 > /proc/sys/kernel/ftrace_enabled
After this ftrace_enabled = 0.
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
Since ftrace_enabled = 0, ftrace_enable_ftrace_graph_caller() is never
called.
# echo 1 > /proc/sys/kernel/ftrace_enabled
Now ftrace_enabled will be set to true, but still
ftrace_enable_ftrace_graph_caller() will not be called, which is not
desired.
Further if we execute the following after this:
# echo nop > /sys/kernel/debug/tracing/current_tracer
Now since ftrace_enabled is set it will call
ftrace_disable_ftrace_graph_caller(), which causes a kernel warning on
the ARM platform.
On the ARM platform, when ftrace_enable_ftrace_graph_caller() is called,
it checks whether the old instruction is a nop or not. If it's not a nop,
then it returns an error. If it is a nop then it replaces instruction at
that address with a branch to ftrace_graph_caller.
ftrace_disable_ftrace_graph_caller() behaves just the opposite. Therefore,
if generic ftrace code ever calls either ftrace_enable_ftrace_graph_caller()
or ftrace_disable_ftrace_graph_caller() consecutively two times in a row,
then it will return an error, which will cause the generic ftrace code to
raise a warning.
Note, x86 does not have an issue with this because the architecture
specific code for ftrace_enable_ftrace_graph_caller() and
ftrace_disable_ftrace_graph_caller() does not check the previous state,
and calling either of these functions twice in a row has no ill effect.
Link: http://lkml.kernel.org/r/e4fbe64cdac0dd0e86a3bf914b0f83c0b419f146.1425666454.git.panand@redhat.com
Cc: stable@vger.kernel.org # 2.6.31+
Signed-off-by: Pratyush Anand <panand@redhat.com>
[
removed extra if (ftrace_start_up) and defined ftrace_graph_active as 0
if CONFIG_FUNCTION_GRAPH_TRACER is not set.
]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When /proc/sys/kernel/ftrace_enabled is set to zero, all function
tracing is disabled. But the records that represent the functions
still hold information about the ftrace_ops that are hooked to them.
ftrace_ops may request "REGS" (have a full set of pt_regs passed to
the callback), or "TRAMP" (the ops has its own trampoline to use).
When the record is updated to represent the state of the ops hooked
to it, it sets "REGS_EN" and/or "TRAMP_EN" to state that the callback
points to the correct trampoline (REGS has its own trampoline).
When ftrace_enabled is set to zero, all ftrace locations are a nop,
so they do not point to any trampoline. But the _EN flags are still
set. This can cause the accounting to go wrong when ftrace_enabled
is cleared and an ops that has a trampoline is registered or unregistered.
For example, the following will cause ftrace to crash:
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
# echo 0 > /proc/sys/kernel/ftrace_enabled
# echo nop > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /proc/sys/kernel/ftrace_enabled
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
As function_graph uses a trampoline, when ftrace_enabled is set to zero
the updates to the record are not done. When enabling function_graph
again, the record will still have the TRAMP_EN flag set, and it will
look for an op that has a trampoline other than the function_graph
ops, and fail to find one.
Cc: stable@vger.kernel.org # 3.17+
Reported-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
o Several clean ups to the code
One such clean up was to convert to 64 bit time keeping, in the
ring buffer benchmark code.
o Adding of __print_array() helper macro for TRACE_EVENT()
o Updating the sample/trace_events/ to add samples of different ways to
make trace events. Lots of features have been added since the sample
code was made, and these features are mostly unknown. Developers
have been making their own hacks to do things that are already available.
o Performance improvements. Most notably, I found a performance bug where
a waiter that is waiting for a full page from the ring buffer will
see that a full page is not available, and go to sleep. The sched
event caused by it going to sleep would cause it to wake up again.
It would see that there was still not a full page, and go back to sleep
again, and that would wake it up again, until finally it would see a
full page. This change has been marked for stable.
Other improvements include removing global locks from fast paths.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU3M+GAAoJEEjnJuOKh9ldpWQIAJTUzeVXlU0cf3bVn768VW7e
XS41WHF34l1tNevmKTh6fCPiw8+U0UMGLQt5WKtyaaARsZn2MlefLVuvHPKFlK2w
+qcI4OEVHH97Qgf9HWJSsYgnZaOnOE+TENqnokEgXMimRMuVcd/S4QaGxwJVDcjm
iBF5j2TaG4aGbx4a3J7KueoZ3K+39r3ut15hIGi/IZBZldQ1pt26ytafD/KA3CU3
BLRM2HLttAMsV1ds0EDLgZjSGICVetFcdOmI5Gwj7Qr3KrOTRPYJMNc8NdDL7Js9
v8VhujhFGvcCrhO/IKpVvd9yluz3RCF+Z7ihc+D/+1B3Nsm0PTwN3Fl5J+f89AA=
=u2Mm
-----END PGP SIGNATURE-----
Merge tag 'trace-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"The updates included in this pull request for ftrace are:
o Several clean ups to the code
One such clean up was to convert to 64 bit time keeping, in the
ring buffer benchmark code.
o Adding of __print_array() helper macro for TRACE_EVENT()
o Updating the sample/trace_events/ to add samples of different ways
to make trace events. Lots of features have been added since the
sample code was made, and these features are mostly unknown.
Developers have been making their own hacks to do things that are
already available.
o Performance improvements. Most notably, I found a performance bug
where a waiter that is waiting for a full page from the ring buffer
will see that a full page is not available, and go to sleep. The
sched event caused by it going to sleep would cause it to wake up
again. It would see that there was still not a full page, and go
back to sleep again, and that would wake it up again, until finally
it would see a full page. This change has been marked for stable.
Other improvements include removing global locks from fast paths"
* tag 'trace-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Do not wake up a splice waiter when page is not full
tracing: Fix unmapping loop in tracing_mark_write
tracing: Add samples of DECLARE_EVENT_CLASS() and DEFINE_EVENT()
tracing: Add TRACE_EVENT_FN example
tracing: Add TRACE_EVENT_CONDITION sample
tracing: Update the TRACE_EVENT fields available in the sample code
tracing: Separate out initializing top level dir from instances
tracing: Make tracing_init_dentry_tr() static
trace: Use 64-bit timekeeping
tracing: Add array printing helper
tracing: Remove newline from trace_printk warning banner
tracing: Use IS_ERR() check for return value of tracing_init_dentry()
tracing: Remove unneeded includes of debugfs.h and fs.h
tracing: Remove taking of trace_types_lock in pipe files
tracing: Add ref count to tracer for when they are being read by pipe
debugfs was fine for the tracing facility as a quick way to get
an interface. Now that tracing has matured, it should separate itself
from debugfs such that it can be mounted separately without needing
to mount all of debugfs with it. That is, users resist using tracing
because it requires mounting debugfs. Having tracing have its own file
system lets users get the features of tracing without needing to bring
in the rest of the kernel's debug infrastructure.
Another reason for tracefs is that debubfs does not support mkdir.
Currently, to create instances, one does a mkdir in the tracing/instance
directory. This is implemented via a hack that forces debugfs to do
something it is not intended on doing. By converting over to tracefs, this
hack can be removed and mkdir can be properly implemented. This patch does
not address this yet, but it lays the ground work for that to be done.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Pull in Al Viro's changes to debugfs that implement the new primitive:
debugfs_create_automount(), that creates a directory in debugfs that will
safely mount another file system automatically when debugfs is mounted.
This will let tracefs automount itself on top of debugfs/tracing directory.
tracing_init_dentry() will soon return NULL as a valid pointer for the
top level tracing directroy. NULL can not be used as an error value.
Instead, switch to ERR_PTR() and check the return status with
IS_ERR().
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Using just the filter for checking for trampolines or regs is not enough
when updating the code against the records that represent all functions.
Both the filter hash and the notrace hash need to be checked.
To trigger this bug (using trace-cmd and perf):
# perf probe -a do_fork
# trace-cmd start -B foo -e probe
# trace-cmd record -p function_graph -n do_fork sleep 1
The trace-cmd record at the end clears the filter before it disables
function_graph tracing and then that causes the accounting of the
ftrace function records to become incorrect and causes ftrace to bug.
Link: http://lkml.kernel.org/r/20150114154329.358378039@goodmis.org
Cc: stable@vger.kernel.org
[ still need to switch old_hash_ops to old_ops_hash ]
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
As the set_ftrace_filter affects both the function tracer as well as the
function graph tracer, the ops that represent each have a shared
ftrace_ops_hash structure. This allows both to be updated when the filter
files are updated.
But if function graph is enabled and the global_ops (function tracing) ops
is not, then it is possible that the filter could be changed without the
update happening for the function graph ops. This will cause the changes
to not take place and may even cause a ftrace_bug to occur as it could mess
with the trampoline accounting.
The solution is to check if the ops uses the shared global_ops filter and
if the ops itself is not enabled, to check if there's another ops that is
enabled and also shares the global_ops filter. In that case, the
modification still needs to be executed.
Link: http://lkml.kernel.org/r/20150114154329.055980438@goodmis.org
Cc: stable@vger.kernel.org # 3.17+
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Introduce FTRACE_OPS_FL_IPMODIFY to avoid conflict among
ftrace users who may modify regs->ip to change the execution
path. If two or more users modify the regs->ip on the same
function entry, one of them will be broken. So they must add
IPMODIFY flag and make sure that ftrace_set_filter_ip() succeeds.
Note that ftrace doesn't allow ftrace_ops which has IPMODIFY
flag to have notrace hash, and the ftrace_ops must have a
filter hash (so that the ftrace_ops can hook only specific
entries), because it strongly depends on the address and
must be allowed for only few selected functions.
Link: http://lkml.kernel.org/r/20141121102516.11844.27829.stgit@localhost.localdomain
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Vojtech Pavlik <vojtech@suse.cz>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
[ fixed up some of the comments ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Fix up a few typos in comments and convert an int into a bool in
update_traceon_count().
Link: http://lkml.kernel.org/r/546DD445.5080108@hitachi.com
Suggested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Stack traces that happen from function tracing check if the address
on the stack is a __kernel_text_address(). That is, is the address
kernel code. This calls core_kernel_text() which returns true
if the address is part of the builtin kernel code. It also calls
is_module_text_address() which returns true if the address belongs
to module code.
But what is missing is ftrace dynamically allocated trampolines.
These trampolines are allocated for individual ftrace_ops that
call the ftrace_ops callback functions directly. But if they do a
stack trace, the code checking the stack wont detect them as they
are neither core kernel code nor module address space.
Adding another field to ftrace_ops that also stores the size of
the trampoline assigned to it we can create a new function called
is_ftrace_trampoline() that returns true if the address is a
dynamically allocate ftrace trampoline. Note, it ignores trampolines
that are not dynamically allocated as they will return true with
the core_kernel_text() function.
Link: http://lkml.kernel.org/r/20141119034829.497125839@goodmis.org
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Using seq_printf to print a simple string or a single character is a
lot more expensive than it needs to be, since seq_puts and seq_putc
exist.
These patches do
seq_printf(m, s) -> seq_puts(m, s)
seq_printf(m, "%s", s) -> seq_puts(m, s)
seq_printf(m, "%c", c) -> seq_putc(m, c)
Subsequent patches will simplify further.
Link: http://lkml.kernel.org/r/1415479332-25944-2-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
With the new logic, if only a single user of ftrace function hooks is
used, it will get its own trampoline assigned to it.
The problem is that the control_ops is an indirect ops that perf ops
uses. What that means is that when perf registers its ops with
register_ftrace_function(), it has the CONTROL flag set and gets added
to the control list instead of the global ftrace list. The control_ops
gets added to that instead and the mcount trampoline calls the control_ops
function. The control_ops function will iterate the control list and
call the ops functions that are attached to it.
But currently the trampoline is added to the perf ops and not the
control ops, and when ftrace tries to find a trampoline hook for it,
it fails to find one and gives the following splat:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 10133 at kernel/trace/ftrace.c:2033 ftrace_get_addr_new+0x6f/0xc0()
Modules linked in: [...]
CPU: 0 PID: 10133 Comm: perf Tainted: P 3.18.0-rc1-test+ #388
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
00000000000007f1 ffff8800c2643bc8 ffffffff814fca6e ffff88011ea0ed01
0000000000000000 ffff8800c2643c08 ffffffff81041ffd 0000000000000000
ffffffff810c388c ffffffff81a5a350 ffff880119b00000 ffffffff810001c8
Call Trace:
[<ffffffff814fca6e>] dump_stack+0x46/0x58
[<ffffffff81041ffd>] warn_slowpath_common+0x81/0x9b
[<ffffffff810c388c>] ? ftrace_get_addr_new+0x6f/0xc0
[<ffffffff810001c8>] ? 0xffffffff810001c8
[<ffffffff81042031>] warn_slowpath_null+0x1a/0x1c
[<ffffffff810c388c>] ftrace_get_addr_new+0x6f/0xc0
[<ffffffff8102e938>] ftrace_replace_code+0xd6/0x334
[<ffffffff810c4116>] ftrace_modify_all_code+0x41/0xc5
[<ffffffff8102eba6>] arch_ftrace_update_code+0x10/0x19
[<ffffffff810c293c>] ftrace_run_update_code+0x21/0x42
[<ffffffff810c298f>] ftrace_startup_enable+0x32/0x34
[<ffffffff810c3049>] ftrace_startup+0x14e/0x15a
[<ffffffff810c307c>] register_ftrace_function+0x27/0x40
[<ffffffff810dc118>] perf_ftrace_event_register+0x3e/0xee
[<ffffffff810dbfbe>] perf_trace_init+0x29d/0x2a9
[<ffffffff810eb422>] perf_tp_event_init+0x27/0x3a
[<ffffffff810f18bc>] perf_init_event+0x9e/0xed
[<ffffffff810f1ba4>] perf_event_alloc+0x299/0x330
[<ffffffff810f236b>] SYSC_perf_event_open+0x3ee/0x816
[<ffffffff8115a066>] ? mntput+0x2d/0x2f
[<ffffffff81142b00>] ? __fput+0xa7/0x1b2
[<ffffffff81091300>] ? do_gettimeofday+0x22/0x3a
[<ffffffff810f279c>] SyS_perf_event_open+0x9/0xb
[<ffffffff81502a92>] system_call_fastpath+0x12/0x17
---[ end trace 81a53565150e4982 ]---
Bad trampoline accounting at: ffffffff810001c8 (run_init_process+0x0/0x2d) (10000001)
Update the control_ops trampoline instead of the perf ops one.
Reported-by: lkp@01.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
With the introduction of the dynamic trampolines, it is useful that if
things go wrong that ftrace_bug() produces more information about what
the current state is. This can help debug issues that may arise.
Ftrace has lots of checks to make sure that the state of the system it
touchs is exactly what it expects it to be. When it detects an abnormality
it calls ftrace_bug() and disables itself to prevent any further damage.
It is crucial that ftrace_bug() produces sufficient information that
can be used to debug the situation.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When the static ftrace_ops (like function tracer) enables tracing, and it
is the only callback that is referencing a function, a trampoline is
dynamically allocated to the function that calls the callback directly
instead of calling a loop function that iterates over all the registered
ftrace ops (if more than one ops is registered).
But when it comes to dynamically allocated ftrace_ops, where they may be
freed, on a CONFIG_PREEMPT kernel there's no way to know when it is safe
to free the trampoline. If a task was preempted while executing on the
trampoline, there's currently no way to know when it will be off that
trampoline.
But this is not true when it comes to !CONFIG_PREEMPT. The current method
of calling schedule_on_each_cpu() will force tasks off the trampoline,
becaues they can not schedule while on it (kernel preemption is not
configured). That means it is safe to free a dynamically allocated
ftrace ops trampoline when CONFIG_PREEMPT is not configured.
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The file /sys/kernel/debug/tracing/eneabled_functions is used to debug
ftrace function hooks. Add to the output what function is being called
by the trampoline if the arch supports it.
Add support for this feature in x86_64.
Cc: H. Peter Anvin <hpa@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The current method of handling multiple function callbacks is to register
a list function callback that calls all the other callbacks based on
their hash tables and compare it to the function that the callback was
called on. But this is very inefficient.
For example, if you are tracing all functions in the kernel and then
add a kprobe to a function such that the kprobe uses ftrace, the
mcount trampoline will switch from calling the function trace callback
to calling the list callback that will iterate over all registered
ftrace_ops (in this case, the function tracer and the kprobes callback).
That means for every function being traced it checks the hash of the
ftrace_ops for function tracing and kprobes, even though the kprobes
is only set at a single function. The kprobes ftrace_ops is checked
for every function being traced!
Instead of calling the list function for functions that are only being
traced by a single callback, we can call a dynamically allocated
trampoline that calls the callback directly. The function graph tracer
already uses a direct call trampoline when it is being traced by itself
but it is not dynamically allocated. It's trampoline is static in the
kernel core. The infrastructure that called the function graph trampoline
can also be used to call a dynamically allocated one.
For now, only ftrace_ops that are not dynamically allocated can have
a trampoline. That is, users such as function tracer or stack tracer.
kprobes and perf allocate their ftrace_ops, and until there's a safe
way to free the trampoline, it can not be used. The dynamically allocated
ftrace_ops may, although, use the trampoline if the kernel is not
compiled with CONFIG_PREEMPT. But that will come later.
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When modifying code, ftrace has several checks to make sure things
are being done correctly. One of them is to make sure any code it
modifies is exactly what it expects it to be before it modifies it.
In order to do so with the new trampoline logic, it must be able
to find out what trampoline a function is hooked to in order to
see if the code that hooks to it is what's expected.
The logic to find the trampoline from a record (accounting descriptor
for a function that is hooked) needs to only look at the "old_hash"
of an ops that is being modified. The old_hash is the list of function
an ops is hooked to before its update. Since a record would only be
pointing to an ops that is being modified if it was already hooked
before.
Currently, it can pick a modified ops based on its new functions it
will be hooked to, and this picks the wrong trampoline and causes
the check to fail, disabling ftrace.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ftrace: squash into ordering of ops for modification
The code that checks for trampolines when modifying function hooks
tests against a modified ops "old_hash". But the ops old_hash pointer
is not being updated before the changes are made, making it possible
to not find the right hash to the callback and possibly causing
ftrace to break in accounting and disable itself.
Have the ops set its old_hash before the modifying takes place.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When the last ftrace_ops is unregistered, all the function records should
have a zeroed flags value. Make sure that is the case when the last ftrace_ops
is unregistered.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Allowing function callbacks to declare their own trampolines requires
that each ftrace_ops that has a trampoline must have some sort of
accounting that keeps track of which ops has a trampoline attached
to a record.
The easy way to solve this was to add a "tramp_hash" that created a
hash entry for every function that a ops uses with a trampoline.
But since we can have literally tens of thousands of functions being
traced, that means we need tens of thousands of descriptors to map
the ops to the function in the hash. This is quite expensive and
can cause enabling and disabling the function graph tracer to take
some time to start and stop. It can take up to several seconds to
disable or enable all functions in the function graph tracer for this
reason.
The better approach albeit more complex, is to keep track of how ops
are being enabled and disabled, and use that along with the counting
of the number of ops attached to records, to determive what ops has
a trampoline attached to a record at enabling and disabling of
tracing.
To do this, the tramp_hash has been replaced with an old_filter_hash
and old_notrace_hash, which get the copy of the ops filter_hash and
notrace_hash respectively. The old hashes is kept until the ops has
been modified or removed and the old hashes are used with the logic
of the accounting to determine the ops that have the trampoline of
a record. The reason this has less of a footprint is due to the trick
that an "empty" hash in the filter_hash means "all functions" and
an empty hash in the notrace hash means "no functions" in the hash.
This is much more efficienct, doesn't have the delay, and takes up
much less memory, as we do not need to map all the functions but
just figure out which functions are mapped at the time it is
enabled or disabled.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add three new flags for ftrace_ops:
FTRACE_OPS_FL_ADDING
FTRACE_OPS_FL_REMOVING
FTRACE_OPS_FL_MODIFYING
These will be set for the ftrace_ops when they are first added
to the function tracing, being removed from function tracing
or just having their functions changed from function tracing,
respectively.
This will be needed to remove the tramp_hash, which can grow quite
big. The tramp_hash is used to note what functions a ftrace_ops
is using a trampoline for. Denoting which ftrace_ops is being
modified, will allow us to use the ftrace_ops hashes themselves,
which are much smaller as they have a global flag to denote if
a ftrace_ops is tracing all functions, as well as a notrace hash
if the ftrace_ops is tracing all but a few. The tramp_hash just
creates a hash item for every function, which can go into the 10s
of thousands if all functions are using the ftrace_ops trampoline.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When dumping the enabled_functions, use the first op that is
found with a trampoline to the record, as there should only be
one, as only one ops can be registered to a function that has
a trampoline.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ftrace_hash_move() currently frees the old hash that is passed to it
after replacing the pointer with the new hash. Instead of having the
function do that chore, have the caller perform the free.
This lets the ftrace_hash_move() be used a bit more freely, which
is needed for changing the way the trampoline logic is done.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The clean up that adds the helper function ftrace_ops_get_func()
caused the default function to not change when DYNAMIC_FTRACE was not
set and no ftrace_ops were registered. Although static tracing is
not very useful (not having DYNAMIC_FTRACE set), it is still supported
and we don't want to break it.
Clean up the if statement even more to specifically have the default
function call ftrace_stub when no ftrace_ops are registered. This
fixes the small bug for static tracing as well as makes the code a
bit more understandable.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>