* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (182 commits)
[SCSI] aacraid: add an ifdef'd device delete case instead of taking the device offline
[SCSI] aacraid: prohibit access to array container space
[SCSI] aacraid: add support for handling ATA pass-through commands.
[SCSI] aacraid: expose physical devices for models with newer firmware
[SCSI] aacraid: respond automatically to volumes added by config tool
[SCSI] fcoe: fix fcoe module ref counting
[SCSI] libfcoe: FIP Keep-Alive messages for VPorts are sent with incorrect port_id and wwn
[SCSI] libfcoe: Fix incorrect MAC address clearing
[SCSI] fcoe: fix a circular locking issue with rtnl and sysfs mutex
[SCSI] libfc: Move the port_id into lport
[SCSI] fcoe: move link speed checking into its own routine
[SCSI] libfc: Remove extra pointer check
[SCSI] libfc: Remove unused fc_get_host_port_type
[SCSI] fcoe: fixes wrong error exit in fcoe_create
[SCSI] libfc: set seq_id for incoming sequence
[SCSI] qla2xxx: Updates to ISP82xx support.
[SCSI] qla2xxx: Optionally disable target reset.
[SCSI] qla2xxx: ensure flash operation and host reset via sg_reset are mutually exclusive
[SCSI] qla2xxx: Silence bogus warning by gcc for wrap and did.
[SCSI] qla2xxx: T10 DIF support added.
...
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (311 commits)
perf tools: Add mode to build without newt support
perf symbols: symbol inconsistency message should be done only at verbose=1
perf tui: Add explicit -lslang option
perf options: Type check all the remaining OPT_ variants
perf options: Type check OPT_BOOLEAN and fix the offenders
perf options: Check v type in OPT_U?INTEGER
perf options: Introduce OPT_UINTEGER
perf tui: Add workaround for slang < 2.1.4
perf record: Fix bug mismatch with -c option definition
perf options: Introduce OPT_U64
perf tui: Add help window to show key associations
perf tui: Make <- exit menus too
perf newt: Add single key shortcuts for zoom into DSO and threads
perf newt: Exit browser unconditionally when CTRL+C, q or Q is pressed
perf newt: Fix the 'A'/'a' shortcut for annotate
perf newt: Make <- exit the ui_browser
x86, perf: P4 PMU - fix counters management logic
perf newt: Make <- zoom out filters
perf report: Report number of events, not samples
perf hist: Clarify events_stats fields usage
...
Fix up trivial conflicts in kernel/fork.c and tools/perf/builtin-record.c
__print_hex() prints values in an array in hex (w/o '0x') (space separated)
EX) 92 33 32 f3 ee 4d
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@jp.fujitsu.com>
Signed-off-by: Kei Tokunaga <tokunaga.keiich@jp.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Trace events can be defined from a template using
DECLARE_EVENT_CLASS/DEFINE_EVENT or directly with TRACE_EVENT.
In both cases we have a template tracepoint handler, used to
record the trace, to which we pass our ftrace event instance.
In the function level, if the class is named "foo" and the event
is named "blah", we have the following chain of calls:
perf_trace_blah() -> perf_trace_templ_foo()
In the case we have several events sharing the class "blah",
we'll have multiple users of perf_trace_templ_foo(), and it
won't be inlined by the compiler. This is usually what happens
with the DECLARE_EVENT_CLASS/DEFINE_EVENT based definition.
But if perf_trace_blah() is the only caller of perf_trace_templ_foo()
there are fair chances that it will be inlined.
The problem is that we fetch the regs from perf_trace_templ_foo()
after we rewinded the frame pointer to the second caller, we want
to reach the caller of perf_trace_blah() to get the right source
of the event. And we do this by always assuming that
perf_trace_templ_foo() is not inlined. But as shown above this
is not always true. And if it is inlined we miss the first caller,
losing the most important level of precision.
We get:
61.31% ls [kernel.kallsyms] [k] do_softirq
|
--- do_softirq
irq_exit
do_IRQ
common_interrupt
|
|--25.00%-- tty_buffer_request_room
Instead of:
61.31% ls [kernel.kallsyms] [k] __do_softirq
|
--- __do_softirq
do_softirq
irq_exit
do_IRQ
common_interrupt
|
|--25.00%-- tty_buffer_request_room
To fix this, we fetch the regs from perf_trace_blah() rather than
perf_trace_templ_foo() so that we don't have to deal with inlining
surprises.
That also bring us the advantage of having the true source of the
event even if we don't have frame pointers.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Make some comments consistent with the code.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4BA97FD0.7090202@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (35 commits)
perf: Fix unexported generic perf_arch_fetch_caller_regs
perf record: Don't try to find buildids in a zero sized file
perf: export perf_trace_regs and perf_arch_fetch_caller_regs
perf, x86: Fix hw_perf_enable() event assignment
perf, ppc: Fix compile error due to new cpu notifiers
perf: Make the install relative to DESTDIR if specified
kprobes: Calculate the index correctly when freeing the out-of-line execution slot
perf tools: Fix sparse CPU numbering related bugs
perf_event: Fix oops triggered by cpu offline/online
perf: Drop the obsolete profile naming for trace events
perf: Take a hot regs snapshot for trace events
perf: Introduce new perf_fetch_caller_regs() for hot regs snapshot
perf/x86-64: Use frame pointer to walk on irq and process stacks
lockdep: Move lock events under lockdep recursion protection
perf report: Print the map table just after samples for which no map was found
perf report: Add multiple event support
perf session: Change perf_session post processing functions to take histogram tree
perf session: Add storage for seperating event types in report
perf session: Change add_hist_entry to take the tree root instead of session
perf record: Add ID and to recorded event data when recording multiple events
...
Drop the obsolete "profile" naming used by perf for trace events.
Perf can now do more than simple events counting, so generalize
the API naming.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
We are taking a wrong regs snapshot when a trace event triggers.
Either we use get_irq_regs(), which gives us the interrupted
registers if we are in an interrupt, or we use task_pt_regs()
which gives us the state before we entered the kernel, assuming
we are lucky enough to be no kernel thread, in which case
task_pt_regs() returns the initial set of regs when the kernel
thread was started.
What we want is different. We need a hot snapshot of the regs,
so that we can get the instruction pointer to record in the
sample, the frame pointer for the callchain, and some other
things.
Let's use the new perf_fetch_caller_regs() for that.
Comparison with perf record -e lock: -R -a -f -g
Before:
perf [kernel] [k] __do_softirq
|
--- __do_softirq
|
|--55.16%-- __open
|
--44.84%-- __write_nocancel
After:
perf [kernel] [k] perf_tp_event
|
--- perf_tp_event
|
|--41.07%-- lock_acquire
| |
| |--39.36%-- _raw_spin_lock
| | |
| | |--7.81%-- hrtimer_interrupt
| | | smp_apic_timer_interrupt
| | | apic_timer_interrupt
The old case was producing unreliable callchains. Now having
right frame and instruction pointers, we have the trace we
want.
Also syscalls and kprobe events already have the right regs,
let's use them instead of wasting a retrieval.
v2: Follow the rename perf_save_regs() -> perf_fetch_caller_regs()
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Archs <linux-arch@vger.kernel.org>
GCC 4.5 introduces behavior that forces the alignment of structures to
use the largest possible value. The default value is 32 bytes, so if
some structures are defined with a 4-byte alignment and others aren't
declared with an alignment constraint at all - it will align at 32-bytes.
For things like the ftrace events, this results in a non-standard array.
When initializing the ftrace subsystem, we traverse the _ftrace_events
section and call the initialization callback for each event. When the
structures are misaligned, we could be treating another part of the
structure (or the zeroed out space between them) as a function pointer.
This patch forces the alignment for all the ftrace_event_call structures
to 4 bytes.
Without this patch, the kernel fails to boot very early when built with
gcc 4.5.
It's trivial to check the alignment of the members of the array, so it
might be worthwhile to add something to the build system to do that
automatically. Unfortunately, that only covers this case. I've asked one
of the gcc developers about adding a warning when this condition is seen.
Cc: stable@kernel.org
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
LKML-Reference: <4B85770B.6010901@suse.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The functions used to implement the TRACE_EVENT macro show up in
function tracing. This is considered a distraction, and these should
not be displayed. For example:
<idle>-0 [000] 57.202149: task_of <-update_stats_wait_end
<idle>-0 [000] 57.202149: ftrace_raw_event_sched_stat_wait <-update_stats_wait_end
<idle>-0 [000] 57.202150: ftrace_raw_event_id_sched_stat_template <-ftrace_raw_event_sched_stat_wait
<idle>-0 [000] 57.202150: sched_stat_wait: comm=sshd pid=2735 delay=19207 [ns]
The "ftrace_raw_event_*" traces are just the utility functions used
by TRACE_EVENT tracepoints.
Cc: Thomas Gleixner <tglx@linutronix.de>
Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Introduce ftrace_perf_buf_prepare() and ftrace_perf_buf_submit() to
gather the common code that operates on raw events sampling buffer.
This cleans up redundant code between regular trace events, syscall
events and kprobe events.
Changelog v1->v2:
- Rename function name as per Masami and Frederic's suggestion
- Add __kprobes for ftrace_perf_buf_prepare() and make
ftrace_perf_buf_submit() inline as per Masami's suggestion
- Export ftrace_perf_buf_prepare since modules will use it
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <4B60E92D.9000808@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
The previous patches added the use of print_fmt string and changes
the trace_define_field() function to also create the fields and
format output for the event format files.
text data bss dec hex filename
5857201 1355780 9336808 16549789 fc879d vmlinux
5884589 1351684 9337896 16574169 fce6d9 vmlinux-orig
The above shows the size of the vmlinux after this patch set
compared to the vmlinux-orig which is before the patch set.
This saves us 27k on text, 1k on bss and adds just 4k of data.
The total savings of 24k in size.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4B273D4D.40604@cn.fujitsu.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This is part of a patch set that removes the show_format method
in the ftrace event macros.
The print_fmt field is added to hold the string that shows
the print_fmt in the event format files. This patch only adds
the field but it is currently not used. Later patches will use
this field to enable us to remove the show_format field
and function.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4B273D3E.2000704@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add is_signed_type() call to trace_define_field() in ftrace macros.
The code previously just passed in 0 (false), disregarding whether
or not the field was actually a signed type.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4B273D3A.6020007@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Quoted from Ingo:
| This reminds me - i think we should eliminate CONFIG_EVENT_PROFILE -
| it's an unnecessary Kconfig complication. If both PERF_EVENTS and
| EVENT_TRACING is enabled we should expose generic tracepoints.
|
| Nor is it limited to event 'profiling', so it has become a misnomer as
| well.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <4B2F1557.2050705@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Like total_profile_count, struct ftrace_event_call::profile_count
is protected by event_mutex, so it doesn't need to be atomic_t.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <4B1DC549.5010705@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Move the printk from each ftrace_raw_reg_event_foo() to
its caller ftrace_event_enable_disable(). This avoids each
regfunc trace event callbacks to handle a same error report
that can be carried from the caller.
See how much space this saves:
text data bss dec hex filename
5345151 1961864 7103260 14410275 dbe223 vmlinux.o.old
5331487 1961864 7103260 14396611 dbacc3 vmlinux.o
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@redhat.com>
LKML-Reference: <4B1DC4AC.802@cn.fujitsu.com>
[start cmdline record before calling regfunc to avoid lost
window of pid to comm resolution]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Call trace_define_common_fields() in event_create_dir() only.
This avoids trace events to handle it from their define_fields
callbacks and shrinks the kernel code size:
text data bss dec hex filename
5346802 1961864 7103260 14411926 dbe896 vmlinux.o.old
5345151 1961864 7103260 14410275 dbe223 vmlinux.o
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
LKML-Reference: <4B1DC49C.8000107@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Use a generic trace_event_raw_init() function for all event's raw_init
callbacks (but kprobes) instead of defining the same version for each
of these.
This shrinks the kernel code:
text data bss dec hex filename
5355293 1961928 7103260 14420481 dc0a01 vmlinux.o.old
5346802 1961864 7103260 14411926 dbe896 vmlinux.o
raw_init can't be removed, because ftrace events and kprobe events
use different raw_init callbacks. Though it's possible to totally
remove raw_init, I choose to leave it as it is for now.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <4B1DC48C.7080603@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
It is not quite obvious at first sight what TRACE_EVENT_TEMPLATE
does: does it define an event as well beyond defining a template?
To clarify this, rename it to DECLARE_EVENT_CLASS, which follows
the various 'DECLARE_*()' idioms we already have in the kernel:
DECLARE_EVENT_CLASS(class)
DEFINE_EVENT(class, event1)
DEFINE_EVENT(class, event2)
DEFINE_EVENT(class, event3)
To complete this logic we should also rename TRACE_EVENT() to:
DEFINE_SINGLE_EVENT(single_event)
... but in a more quiet moment of the kernel cycle.
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B0E286A.2000405@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After creating the TRACE_EVENT_TEMPLATE I started to look at other
trace points to see what duplication was made. I noticed that there
are several trace points where they are almost identical except for
the name and the output format. Since TRACE_EVENT_TEMPLATE was successful
in bringing down the size of trace events, I added a DEFINE_EVENT_PRINT.
DEFINE_EVENT_PRINT is used just like DEFINE_EVENT is. That is, the
DEFINE_EVENT_PRINT also uses a TRACE_EVENT_TEMPLATE, but it allows the
developer to overwrite the print format. If there are two or more
TRACE_EVENTS that are identical except for the name and print, then
they can be converted to use a TRACE_EVENT_TEMPLATE. Since the
TRACE_EVENT_TEMPLATE already does the print output, the first trace event
would have its print format held in the TRACE_EVENT_TEMPLATE and
be defined with a DEFINE_EVENT. The rest will use the DEFINE_EVENT_PRINT
and override the print format.
Converting the sched trace points to both DEFINE_EVENT and
DEFINE_EVENT_PRINT. Five were converted to DEFINE_EVENT and two were
converted to DEFINE_EVENT_PRINT.
I was able to get the following:
$ size kernel/sched.o-*
text data bss dec hex filename
79299 6776 2520 88595 15a13 kernel/sched.o-notrace
101941 11896 2584 116421 1c6c5 kernel/sched.o-templ
104779 11896 2584 119259 1d1db kernel/sched.o-trace
sched.o-notrace is the scheduler compiled with no trace points.
sched.o-templ is with the use of DEFINE_EVENT and DEFINE_EVENT_PRINT
sched.o-trace is the current trace events.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
There are some places in the kernel that define several tracepoints and
they are all identical besides the name. The code to enable, disable and
record is created for every trace point even if most of the code is
identical.
This patch adds TRACE_EVENT_TEMPLATE that lets the developer create
a template TRACE_EVENT and create trace points with DEFINE_EVENT, which
is based off of a given template. Each trace point used by this
will share most of the code, and bring down the size of the kernel
when there are several duplicate events.
Usage is:
TRACE_EVENT_TEMPLATE(name, proto, args, tstruct, assign, print);
Which would be the same as defining a normal TRACE_EVENT.
To create the trace events that the trace points will use:
DEFINE_EVENT(template, name, proto, args) is done. The template
is the name of the TRACE_EVENT_TEMPLATE to use. The name is the
name of the trace point. The parameters proto and args must be the same
as the proto and args of the template. If they are not the same,
then a compile error will result. I tried hard removing this duplication
but the C preprocessor is not powerful enough (or my CPP magic
experience points is not at a high enough level) to not need them.
A lot of trace events are coming in with new XFS development. Most of
the trace points are identical except for the name. The following shows
the advantage of having TRACE_EVENT_TEMPLATE:
$ size fs/xfs/xfs.o.*
text data bss dec hex filename
452114 2788 3520 458422 6feb6 fs/xfs/xfs.o.old
638482 38116 3744 680342 a6196 fs/xfs/xfs.o.template
996954 38116 4480 1039550 fdcbe fs/xfs/xfs.o.trace
xfs.o.old is without any tracepoints.
xfs.o.template uses the new TRACE_EVENT_TEMPLATE.
xfs.o.trace uses the current TRACE_EVENT macros.
Requested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Make perf_swevent_get_recursion_context return a context number
and disable preemption.
This could be used to remove the IRQ disable from the trace bit
and index the per-cpu buffer with.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20091123103819.993226816@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When we commit a trace to perf, we first check if we are
recursing in the same buffer so that we don't mess-up the buffer
with a recursing trace. But later on, we do the same check from
perf to avoid commit recursion. The recursion check is desired
early before we touch the buffer but we want to do this check
only once.
Then export the recursion protection from perf and use it from
the trace events before submitting a trace.
v2: Put appropriate Reported-by tag
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
LKML-Reference: <1258864015-10579-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For some reason the export of the event print format to userspace
uses '#fmt' which breaks if the format string is anything but a plain
string, for example if it is built with macros then the macro names
are exported instead of their contents.
Use
"\"%s\"", fmt
instead of
"%s", #fmt
to export the string and not the way it is built.
For example, in net/mac80211/driver-trace.h for the trace event drv_start
there is:
TP_printk(
LOCAL_PR_FMT, LOCAL_PR_ARG
)
Which use to produce:
print fmt: LOCAL_PR_FMT, REC->wiphy_name
Now produces:
print fmt: "%s", REC->wiphy_name
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
LKML-Reference: <20091113224009.GB23942@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
While tracing using events with perf, if one enables the
lockdep:lock_acquire event, it will infect every other perf
trace events.
Basically, you can enable whatever set of trace events through
perf but if this event is part of the set, the only result we
can get is a long list of lock_acquire events of rcu read lock,
and only that.
This is because of a recursion inside perf.
1) When a trace event is triggered, it will fill a per cpu
buffer and submit it to perf.
2) Perf will commit this event but will also protect some data
using rcu_read_lock
3) A recursion appears: rcu_read_lock triggers a lock_acquire
event that will fill the per cpu event and then submit the
buffer to perf.
4) Perf detects a recursion and ignores it
5) Perf continues its work on the previous event, but its buffer
has been overwritten by the lock_acquire event, it has then
been turned into a lock_acquire event of rcu read lock
Such scenario also happens with lock_release with
rcu_read_unlock().
We could turn the rcu_read_lock() into __rcu_read_lock() to drop
the lock debugging from perf fast path, but that would make us
lose the rcu debugging and that doesn't prevent from other
possible kind of recursion from perf in the future.
This patch adds a recursion protection based on a counter on the
perf trace per cpu buffers to solve the problem.
-v2: Fixed lost whitespace, added reviewed-by tag
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Jason Baron <jbaron@redhat.com>
LKML-Reference: <1257477185-7838-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
tools/perf/Makefile
Merge reason:
- fix the conflict
- pick up the pr_*() infrastructure to queue up dependent patch
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The sign info used for filters in the kernel is also useful to
applications that process the trace stream. Add it to the format
files and make it available to userspace.
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: rostedt@goodmis.org
Cc: lizf@cn.fujitsu.com
Cc: hch@infradead.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1254809398-8078-2-git-send-email-tzanussi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'perfcounters-rename-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Tidy up after the big rename
perf: Do the big rename: Performance Counters -> Performance Events
perf_counter: Rename 'event' to event_id/hw_event
perf_counter: Rename list_entry -> group_entry, counter_list -> group_list
Manually resolved some fairly trivial conflicts with the tracing tree in
include/trace/ftrace.h and kernel/trace/trace_syscalls.c.
Bye-bye Performance Counters, welcome Performance Events!
In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.
Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.
All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)
The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.
Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.
User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)
This patch has been generated via the following script:
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES
for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done
FILES=$(find . -name perf_event.*)
sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES
... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.
Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.
( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )
Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently the trace event profile buffer is allocated in the stack. But
this may be too much for the stack, as the events can have large
statically defined field size and can also grow with dynamic arrays.
Allocate two per cpu buffer for all profiled events. The first cpu
buffer is used to host every non-nmi context traces. It is protected
by disabling the interrupts while writing and committing the trace.
The second buffer is reserved for nmi. So that there is no race between
them and the first buffer.
The whole write/commit section is rcu protected because we release
these buffers while deactivating the last profiling trace event.
v2: Move the buffers from trace_event to be global, as pointed by
Steven Rostedt.
v3: Fix the syscall events to handle the profiling buffer races
by disabling interrupts, now that the buffers are globals.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Factorize the events enabling accounting in a common tracing core
helper. This reduces the size of the profile_enable() and
profile_disable() callbacks for each trace events.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
The generated functions of TRACE_EVENT uses "flags" in one of the
sub macros which shadows a parameter in the outside macro.
Simple fix is to make the submacro use __flags instead.
Discovered by sparse.
Reported-by: Jaswinder Singh Rajput <jaswinder@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Some of the generated functions used in the TRACE_EVENT macros are
not declared static, but they are not global.
Discovered by sparse.
Reported-by: Jaswinder Singh Rajput <jaswinder@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Conflicts:
kernel/trace/trace_export.c
kernel/trace/trace_kprobe.c
Merge reason: This topic branch lacks an important
build fix in tracing/core:
0dd7b74787:
tracing: Fix double CPP substitution in TRACE_EVENT_FN
that prevents from multiple tracepoint headers inclusion crashes.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
The latency tracers (irqsoff and wakeup) can swap trace buffers
on the fly. If an event is happening and has reserved data on one of
the buffers, and the latency tracer swaps the global buffer with the
max buffer, the result is that the event may commit the data to the
wrong buffer.
This patch changes the API to the trace recording to be recieve the
buffer that was used to reserve a commit. Then this buffer can be passed
in to the commit.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
init_preds() allocates about 5392 bytes of memory (on x86_32) for
a TRACE_EVENT. With my config, at system boot total memory occupied
is:
5392 * (642 + 15) == 3459KB
642 == cat available_events | wc -l
15 == number of dirs in events/ftrace
That's quite a lot, so we'd better defer memory allocation util
it's needed, that's when filter is used.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
LKML-Reference: <4A9B8EA5.6020700@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
TRACE_EVENT_FN relays on TRACE_EVENT by reprocessing its parameters
into the ftrace events CPP macro. This leads to a double substitution
in some cases.
For example, a bad consequence is a format always prefixed by
"%s, %s\n" for every TRACE_EVENT_FN based events.
Eg:
cat /debug/tracing/events/syscalls/sys_enter/format
[...]
print fmt: "%s, %s\n", "\"NR %ld (%lx, %lx, %lx, %lx, %lx, %lx)\"",\
"REC->id, REC->args[0], REC->args[1], REC->args[2], REC->args[3],\
REC->args[4], REC->args[5]"
This creates a failure in post-processing tools such as perf trace or
trace-cmd.
Then drop this double substitution and replace it by a new __cpparg()
macro that relays CPP arguments containing commas.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Josh Stone <jistone@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
LKML-Reference: <1251413406-6704-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add dynamic ftrace_event_call support to ftrace. Trace engines can add
new ftrace_event_call to ftrace on the fly. Each operator function of
the call takes an ftrace_event_call data structure as an argument,
because these functions may be shared among several ftrace_event_calls.
Changes from v13:
- Define remove_subsystem_dir() always (revirt a2ca5e03), because
trace_remove_event_call() uses it.
- Modify syscall tracer because of ftrace_event_call change.
[fweisbec@gmail.com: Fixed conflict against latest tracing/core]
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Przemysław Pawełczyk <przemyslaw@pawelczyk.it>
Cc: Roland McGrath <roland@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
LKML-Reference: <20090813203453.31965.71901.stgit@localhost.localdomain>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Add __field_ext(), so a field can be assigned to a specific
filter_type, which matches a corresponding filter function.
For example, a later patch will allow this:
__field_ext(const char *, str, FILTER_PTR_STR);
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A7B9272.6050709@cn.fujitsu.com>
[
Fixed a -1 to FILTER_OTHER
Forward ported to latest kernel.
]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
It's not strictly correct for the tracepoint reg/unreg callbacks to
occur when a client is hooking up, because the actual tracepoint may not
be present yet. This happens to be fine for syscall, since that's in
the core kernel, but it would cause problems for tracepoints defined in
a module that hasn't been loaded yet. It also means the reg/unreg has
to be EXPORTed for any modules to use the tracepoint (as in SystemTap).
This patch removes DECLARE_TRACE_WITH_CALLBACK, and instead introduces
DEFINE_TRACE_FN which stores the callbacks in struct tracepoint. The
callbacks are used now when the active state of the tracepoint changes
in set_tracepoint & disable_tracepoint.
This also introduces TRACE_EVENT_FN, so ftrace events can also provide
registration callbacks if needed.
Signed-off-by: Josh Stone <jistone@redhat.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Jiaying Zhang <jiayingz@google.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <1251150194-1713-4-git-send-email-jistone@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Extract duplicate code. Also prepare for the later patch.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4A8BAFB8.1010304@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This parameter is needed by syscall events to add define_fields()
handler.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4A8BAF90.6060801@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>