License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 21:07:57 +07:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2017-04-18 01:23:08 +07:00
|
|
|
#include <inttypes.h>
|
2013-09-11 12:09:28 +07:00
|
|
|
#include <stdio.h>
|
|
|
|
#include <stdlib.h>
|
|
|
|
#include <string.h>
|
|
|
|
|
|
|
|
#include <linux/kernel.h>
|
2019-06-26 22:13:13 +07:00
|
|
|
#include <linux/string.h>
|
2019-07-04 21:32:27 +07:00
|
|
|
#include <linux/zalloc.h>
|
2013-09-11 12:09:28 +07:00
|
|
|
|
2013-09-11 12:09:30 +07:00
|
|
|
#include "util/dso.h"
|
2013-09-11 12:09:28 +07:00
|
|
|
#include "util/debug.h"
|
2017-03-26 03:34:26 +07:00
|
|
|
#include "util/callchain.h"
|
2019-06-28 16:23:03 +07:00
|
|
|
#include "util/symbol_conf.h"
|
2017-04-18 02:30:49 +07:00
|
|
|
#include "srcline.h"
|
2017-10-31 09:06:54 +07:00
|
|
|
#include "string2.h"
|
2014-11-13 09:05:27 +07:00
|
|
|
#include "symbol.h"
|
|
|
|
|
2015-08-08 05:24:05 +07:00
|
|
|
bool srcline_full_filename;
|
|
|
|
|
2017-03-26 03:34:25 +07:00
|
|
|
static const char *dso__name(struct dso *dso)
|
|
|
|
{
|
|
|
|
const char *dso_name;
|
|
|
|
|
|
|
|
if (dso->symsrc_filename)
|
|
|
|
dso_name = dso->symsrc_filename;
|
|
|
|
else
|
|
|
|
dso_name = dso->long_name;
|
|
|
|
|
|
|
|
if (dso_name[0] == '[')
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
if (!strncmp(dso_name, "/tmp/perf-", 10))
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
return dso_name;
|
|
|
|
}
|
|
|
|
|
2017-10-10 03:32:58 +07:00
|
|
|
static int inline_list__append(struct symbol *symbol, char *srcline,
|
|
|
|
struct inline_node *node)
|
2017-03-26 03:34:26 +07:00
|
|
|
{
|
|
|
|
struct inline_list *ilist;
|
|
|
|
|
|
|
|
ilist = zalloc(sizeof(*ilist));
|
|
|
|
if (ilist == NULL)
|
|
|
|
return -1;
|
|
|
|
|
2017-10-10 03:32:57 +07:00
|
|
|
ilist->symbol = symbol;
|
2017-10-10 03:32:58 +07:00
|
|
|
ilist->srcline = srcline;
|
2017-03-26 03:34:26 +07:00
|
|
|
|
2017-05-24 13:21:27 +07:00
|
|
|
if (callchain_param.order == ORDER_CALLEE)
|
|
|
|
list_add_tail(&ilist->list, &node->val);
|
|
|
|
else
|
|
|
|
list_add(&ilist->list, &node->val);
|
2017-03-26 03:34:26 +07:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-10-10 03:32:58 +07:00
|
|
|
/* basename version that takes a const input string */
|
|
|
|
static const char *gnu_basename(const char *path)
|
|
|
|
{
|
|
|
|
const char *base = strrchr(path, '/');
|
|
|
|
|
|
|
|
return base ? base + 1 : path;
|
|
|
|
}
|
|
|
|
|
|
|
|
static char *srcline_from_fileline(const char *file, unsigned int line)
|
|
|
|
{
|
|
|
|
char *srcline;
|
|
|
|
|
|
|
|
if (!file)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
if (!srcline_full_filename)
|
|
|
|
file = gnu_basename(file);
|
|
|
|
|
|
|
|
if (asprintf(&srcline, "%s:%u", file, line) < 0)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
return srcline;
|
|
|
|
}
|
|
|
|
|
2017-10-31 09:06:54 +07:00
|
|
|
static struct symbol *new_inline_sym(struct dso *dso,
|
|
|
|
struct symbol *base_sym,
|
|
|
|
const char *funcname)
|
|
|
|
{
|
|
|
|
struct symbol *inline_sym;
|
|
|
|
char *demangled = NULL;
|
|
|
|
|
perf report: Don't crash on invalid inline debug information
When the function name for an inline frame is invalid, we must not try
to demangle this symbol, otherwise we crash with:
#0 0x0000555555895c01 in bfd_demangle ()
#1 0x0000555555823262 in demangle_sym (dso=0x555555d92b90, elf_name=0x0, kmodule=0) at util/symbol-elf.c:215
#2 dso__demangle_sym (dso=dso@entry=0x555555d92b90, kmodule=<optimized out>, kmodule@entry=0, elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400
#3 0x00005555557fef4b in new_inline_sym (funcname=0x0, base_sym=0x555555d92b90, dso=0x555555d92b90) at util/srcline.c:89
#4 inline_list__append_dso_a2l (dso=dso@entry=0x555555c7bb00, node=node@entry=0x555555e31810, sym=sym@entry=0x555555d92b90) at util/srcline.c:264
#5 0x00005555557ff27f in addr2line (dso_name=dso_name@entry=0x555555d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf", addr=addr@entry=2888, file=file@entry=0x0,
line=line@entry=0x0, dso=dso@entry=0x555555c7bb00, unwind_inlines=unwind_inlines@entry=true, node=0x555555e31810, sym=0x555555d92b90) at util/srcline.c:313
#6 0x00005555557ffe7c in addr2inlines (sym=0x555555d92b90, dso=0x555555c7bb00, addr=2888, dso_name=0x555555d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf")
at util/srcline.c:358
So instead handle the case where we get invalid function names for
inlined frames and use a fallback '??' function name instead.
While this crash was originally reported by Hadrien for rust code, I can
now also reproduce it with trivial C++ code. Indeed, it seems like
libbfd fails to interpret the debug information for the inline frame
symbol name:
$ addr2line -e /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf -if b48
main
/usr/include/c++/8.2.1/complex:610
??
/usr/include/c++/8.2.1/complex:618
??
/usr/include/c++/8.2.1/complex:675
??
/usr/include/c++/8.2.1/complex:685
main
/home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39
I've reported this bug upstream and also attached a patch there which
should fix this issue:
https://sourceware.org/bugzilla/show_bug.cgi?id=23715
Reported-by: Hadrien Grasland <grasland@lal.in2p3.fr>
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: a64489c56c30 ("perf report: Find the inline stack for a given address")
[ The above 'Fixes:' cset is where originally the problem was
introduced, i.e. using a2l->funcname without checking if it is NULL,
but this current patch fixes the current codebase, i.e. multiple csets
were applied after a64489c56c30 before the problem was reported by Hadrien ]
Link: http://lkml.kernel.org/r/20180926135207.30263-3-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-09-26 20:52:07 +07:00
|
|
|
if (!funcname)
|
|
|
|
funcname = "??";
|
|
|
|
|
2017-10-31 09:06:54 +07:00
|
|
|
if (dso) {
|
|
|
|
demangled = dso__demangle_sym(dso, 0, funcname);
|
|
|
|
if (demangled)
|
|
|
|
funcname = demangled;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (base_sym && strcmp(funcname, base_sym->name) == 0) {
|
|
|
|
/* reuse the real, existing symbol */
|
|
|
|
inline_sym = base_sym;
|
|
|
|
/* ensure that we don't alias an inlined symbol, which could
|
|
|
|
* lead to double frees in inline_node__delete
|
|
|
|
*/
|
|
|
|
assert(!base_sym->inlined);
|
|
|
|
} else {
|
|
|
|
/* create a fake symbol for the inline frame */
|
|
|
|
inline_sym = symbol__new(base_sym ? base_sym->start : 0,
|
2019-02-19 20:05:31 +07:00
|
|
|
base_sym ? (base_sym->end - base_sym->start) : 0,
|
2017-10-31 09:06:54 +07:00
|
|
|
base_sym ? base_sym->binding : 0,
|
2018-04-26 21:09:10 +07:00
|
|
|
base_sym ? base_sym->type : 0,
|
2017-10-31 09:06:54 +07:00
|
|
|
funcname);
|
|
|
|
if (inline_sym)
|
|
|
|
inline_sym->inlined = 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
free(demangled);
|
|
|
|
|
|
|
|
return inline_sym;
|
|
|
|
}
|
|
|
|
|
2013-09-11 12:09:32 +07:00
|
|
|
#ifdef HAVE_LIBBFD_SUPPORT
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Implement addr2line using libbfd.
|
|
|
|
*/
|
|
|
|
#define PACKAGE "perf"
|
|
|
|
#include <bfd.h>
|
|
|
|
|
|
|
|
struct a2l_data {
|
|
|
|
const char *input;
|
2014-12-16 13:19:06 +07:00
|
|
|
u64 addr;
|
2013-09-11 12:09:32 +07:00
|
|
|
|
|
|
|
bool found;
|
|
|
|
const char *filename;
|
|
|
|
const char *funcname;
|
|
|
|
unsigned line;
|
|
|
|
|
|
|
|
bfd *abfd;
|
|
|
|
asymbol **syms;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int bfd_error(const char *string)
|
|
|
|
{
|
|
|
|
const char *errmsg;
|
|
|
|
|
|
|
|
errmsg = bfd_errmsg(bfd_get_error());
|
|
|
|
fflush(stdout);
|
|
|
|
|
|
|
|
if (string)
|
|
|
|
pr_debug("%s: %s\n", string, errmsg);
|
|
|
|
else
|
|
|
|
pr_debug("%s\n", errmsg);
|
|
|
|
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int slurp_symtab(bfd *abfd, struct a2l_data *a2l)
|
|
|
|
{
|
|
|
|
long storage;
|
|
|
|
long symcount;
|
|
|
|
asymbol **syms;
|
|
|
|
bfd_boolean dynamic = FALSE;
|
|
|
|
|
|
|
|
if ((bfd_get_file_flags(abfd) & HAS_SYMS) == 0)
|
|
|
|
return bfd_error(bfd_get_filename(abfd));
|
|
|
|
|
|
|
|
storage = bfd_get_symtab_upper_bound(abfd);
|
|
|
|
if (storage == 0L) {
|
|
|
|
storage = bfd_get_dynamic_symtab_upper_bound(abfd);
|
|
|
|
dynamic = TRUE;
|
|
|
|
}
|
|
|
|
if (storage < 0L)
|
|
|
|
return bfd_error(bfd_get_filename(abfd));
|
|
|
|
|
|
|
|
syms = malloc(storage);
|
|
|
|
if (dynamic)
|
|
|
|
symcount = bfd_canonicalize_dynamic_symtab(abfd, syms);
|
|
|
|
else
|
|
|
|
symcount = bfd_canonicalize_symtab(abfd, syms);
|
|
|
|
|
|
|
|
if (symcount < 0) {
|
|
|
|
free(syms);
|
|
|
|
return bfd_error(bfd_get_filename(abfd));
|
|
|
|
}
|
|
|
|
|
|
|
|
a2l->syms = syms;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void find_address_in_section(bfd *abfd, asection *section, void *data)
|
|
|
|
{
|
|
|
|
bfd_vma pc, vma;
|
|
|
|
bfd_size_type size;
|
|
|
|
struct a2l_data *a2l = data;
|
2020-01-28 22:29:38 +07:00
|
|
|
flagword flags;
|
2013-09-11 12:09:32 +07:00
|
|
|
|
|
|
|
if (a2l->found)
|
|
|
|
return;
|
|
|
|
|
2020-01-28 22:29:38 +07:00
|
|
|
#ifdef bfd_get_section_flags
|
|
|
|
flags = bfd_get_section_flags(abfd, section);
|
|
|
|
#else
|
|
|
|
flags = bfd_section_flags(section);
|
|
|
|
#endif
|
|
|
|
if ((flags & SEC_ALLOC) == 0)
|
2013-09-11 12:09:32 +07:00
|
|
|
return;
|
|
|
|
|
|
|
|
pc = a2l->addr;
|
2020-01-28 22:29:38 +07:00
|
|
|
#ifdef bfd_get_section_vma
|
2013-09-11 12:09:32 +07:00
|
|
|
vma = bfd_get_section_vma(abfd, section);
|
2020-01-28 22:29:38 +07:00
|
|
|
#else
|
|
|
|
vma = bfd_section_vma(section);
|
|
|
|
#endif
|
|
|
|
#ifdef bfd_get_section_size
|
2013-09-11 12:09:32 +07:00
|
|
|
size = bfd_get_section_size(section);
|
2020-01-28 22:29:38 +07:00
|
|
|
#else
|
|
|
|
size = bfd_section_size(section);
|
|
|
|
#endif
|
2013-09-11 12:09:32 +07:00
|
|
|
|
|
|
|
if (pc < vma || pc >= vma + size)
|
|
|
|
return;
|
|
|
|
|
|
|
|
a2l->found = bfd_find_nearest_line(abfd, section, a2l->syms, pc - vma,
|
|
|
|
&a2l->filename, &a2l->funcname,
|
|
|
|
&a2l->line);
|
2017-08-07 04:24:45 +07:00
|
|
|
|
|
|
|
if (a2l->filename && !strlen(a2l->filename))
|
|
|
|
a2l->filename = NULL;
|
2013-09-11 12:09:32 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct a2l_data *addr2line_init(const char *path)
|
|
|
|
{
|
|
|
|
bfd *abfd;
|
|
|
|
struct a2l_data *a2l = NULL;
|
|
|
|
|
|
|
|
abfd = bfd_openr(path, NULL);
|
|
|
|
if (abfd == NULL)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
if (!bfd_check_format(abfd, bfd_object))
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
a2l = zalloc(sizeof(*a2l));
|
|
|
|
if (a2l == NULL)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
a2l->abfd = abfd;
|
|
|
|
a2l->input = strdup(path);
|
|
|
|
if (a2l->input == NULL)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
if (slurp_symtab(abfd, a2l))
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
return a2l;
|
|
|
|
|
|
|
|
out:
|
|
|
|
if (a2l) {
|
2014-01-09 21:07:59 +07:00
|
|
|
zfree((char **)&a2l->input);
|
2013-09-11 12:09:32 +07:00
|
|
|
free(a2l);
|
|
|
|
}
|
|
|
|
bfd_close(abfd);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void addr2line_cleanup(struct a2l_data *a2l)
|
|
|
|
{
|
|
|
|
if (a2l->abfd)
|
|
|
|
bfd_close(a2l->abfd);
|
2014-01-09 21:07:59 +07:00
|
|
|
zfree((char **)&a2l->input);
|
2013-12-28 02:55:14 +07:00
|
|
|
zfree(&a2l->syms);
|
2013-09-11 12:09:32 +07:00
|
|
|
free(a2l);
|
|
|
|
}
|
|
|
|
|
2015-09-02 01:47:19 +07:00
|
|
|
#define MAX_INLINE_NEST 1024
|
|
|
|
|
2017-05-24 13:21:28 +07:00
|
|
|
static int inline_list__append_dso_a2l(struct dso *dso,
|
2017-10-10 03:32:57 +07:00
|
|
|
struct inline_node *node,
|
|
|
|
struct symbol *sym)
|
2017-05-24 13:21:28 +07:00
|
|
|
{
|
|
|
|
struct a2l_data *a2l = dso->a2l;
|
2017-10-10 03:32:57 +07:00
|
|
|
struct symbol *inline_sym = new_inline_sym(dso, sym, a2l->funcname);
|
2017-10-10 03:32:58 +07:00
|
|
|
char *srcline = NULL;
|
2017-05-24 13:21:28 +07:00
|
|
|
|
2017-10-10 03:32:58 +07:00
|
|
|
if (a2l->filename)
|
|
|
|
srcline = srcline_from_fileline(a2l->filename, a2l->line);
|
|
|
|
|
|
|
|
return inline_list__append(inline_sym, srcline, node);
|
2017-05-24 13:21:28 +07:00
|
|
|
}
|
|
|
|
|
2014-12-16 13:19:06 +07:00
|
|
|
static int addr2line(const char *dso_name, u64 addr,
|
2015-09-02 01:47:19 +07:00
|
|
|
char **file, unsigned int *line, struct dso *dso,
|
2017-10-10 03:32:57 +07:00
|
|
|
bool unwind_inlines, struct inline_node *node,
|
|
|
|
struct symbol *sym)
|
2013-09-11 12:09:32 +07:00
|
|
|
{
|
|
|
|
int ret = 0;
|
2013-12-03 14:23:07 +07:00
|
|
|
struct a2l_data *a2l = dso->a2l;
|
|
|
|
|
|
|
|
if (!a2l) {
|
|
|
|
dso->a2l = addr2line_init(dso_name);
|
|
|
|
a2l = dso->a2l;
|
|
|
|
}
|
2013-09-11 12:09:32 +07:00
|
|
|
|
|
|
|
if (a2l == NULL) {
|
2019-06-28 16:23:03 +07:00
|
|
|
if (!symbol_conf.disable_add2line_warn)
|
|
|
|
pr_warning("addr2line_init failed for %s\n", dso_name);
|
2013-09-11 12:09:32 +07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
a2l->addr = addr;
|
2013-12-03 14:23:07 +07:00
|
|
|
a2l->found = false;
|
|
|
|
|
2013-09-11 12:09:32 +07:00
|
|
|
bfd_map_over_sections(a2l->abfd, find_address_in_section, a2l);
|
|
|
|
|
2017-05-24 13:21:24 +07:00
|
|
|
if (!a2l->found)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (unwind_inlines) {
|
2015-09-02 01:47:19 +07:00
|
|
|
int cnt = 0;
|
|
|
|
|
2017-10-10 03:32:57 +07:00
|
|
|
if (node && inline_list__append_dso_a2l(dso, node, sym))
|
2017-05-24 13:21:28 +07:00
|
|
|
return 0;
|
|
|
|
|
2015-09-02 01:47:19 +07:00
|
|
|
while (bfd_find_inliner_info(a2l->abfd, &a2l->filename,
|
|
|
|
&a2l->funcname, &a2l->line) &&
|
2017-03-26 03:34:26 +07:00
|
|
|
cnt++ < MAX_INLINE_NEST) {
|
|
|
|
|
2017-08-07 04:24:45 +07:00
|
|
|
if (a2l->filename && !strlen(a2l->filename))
|
|
|
|
a2l->filename = NULL;
|
|
|
|
|
2017-03-26 03:34:26 +07:00
|
|
|
if (node != NULL) {
|
2017-10-10 03:32:57 +07:00
|
|
|
if (inline_list__append_dso_a2l(dso, node, sym))
|
2017-03-26 03:34:26 +07:00
|
|
|
return 0;
|
2017-05-24 13:21:24 +07:00
|
|
|
// found at least one inline frame
|
|
|
|
ret = 1;
|
2017-03-26 03:34:26 +07:00
|
|
|
}
|
|
|
|
}
|
2015-09-02 01:47:19 +07:00
|
|
|
}
|
|
|
|
|
2017-05-24 13:21:24 +07:00
|
|
|
if (file) {
|
|
|
|
*file = a2l->filename ? strdup(a2l->filename) : NULL;
|
|
|
|
ret = *file ? 1 : 0;
|
2013-09-11 12:09:32 +07:00
|
|
|
}
|
|
|
|
|
2017-05-24 13:21:24 +07:00
|
|
|
if (line)
|
|
|
|
*line = a2l->line;
|
|
|
|
|
2013-09-11 12:09:32 +07:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2013-12-03 14:23:07 +07:00
|
|
|
void dso__free_a2l(struct dso *dso)
|
|
|
|
{
|
|
|
|
struct a2l_data *a2l = dso->a2l;
|
|
|
|
|
|
|
|
if (!a2l)
|
|
|
|
return;
|
|
|
|
|
|
|
|
addr2line_cleanup(a2l);
|
|
|
|
|
|
|
|
dso->a2l = NULL;
|
|
|
|
}
|
|
|
|
|
2017-03-26 03:34:26 +07:00
|
|
|
static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
|
2017-10-10 03:32:57 +07:00
|
|
|
struct dso *dso, struct symbol *sym)
|
2017-03-26 03:34:26 +07:00
|
|
|
{
|
|
|
|
struct inline_node *node;
|
|
|
|
|
|
|
|
node = zalloc(sizeof(*node));
|
|
|
|
if (node == NULL) {
|
|
|
|
perror("not enough memory for the inline node");
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
INIT_LIST_HEAD(&node->val);
|
|
|
|
node->addr = addr;
|
|
|
|
|
perf report: Cache failed lookups of inlined frames
When no inlined frames could be found for a given address, we did not
store this information anywhere. That means we potentially do the costly
inliner lookup repeatedly for cases where we know it can never succeed.
This patch makes dso__parse_addr_inlines always return a valid
inline_node. It will be empty when no inliners are found. This enables
us to cache the empty list in the DSO, thereby improving the performance
when many addresses fail to find the inliners.
For my trivial example, the performance impact is already quite
significant:
Before:
~~~~~
Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
594.804032 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% )
53 context-switches # 0.089 K/sec ( +- 4.09% )
0 cpu-migrations # 0.000 K/sec ( +-100.00% )
5,687 page-faults # 0.010 M/sec ( +- 0.02% )
2,300,918,213 cycles # 3.868 GHz ( +- 0.09% )
4,395,839,080 instructions # 1.91 insn per cycle ( +- 0.00% )
939,177,205 branches # 1578.969 M/sec ( +- 0.00% )
11,824,633 branch-misses # 1.26% of all branches ( +- 0.10% )
0.596246531 seconds time elapsed ( +- 0.07% )
~~~~~
After:
~~~~~
Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
113.111405 task-clock (msec) # 0.990 CPUs utilized ( +- 0.89% )
29 context-switches # 0.255 K/sec ( +- 54.25% )
0 cpu-migrations # 0.000 K/sec
5,380 page-faults # 0.048 M/sec ( +- 0.01% )
432,378,779 cycles # 3.823 GHz ( +- 0.75% )
670,057,633 instructions # 1.55 insn per cycle ( +- 0.01% )
141,001,247 branches # 1246.570 M/sec ( +- 0.01% )
2,346,845 branch-misses # 1.66% of all branches ( +- 0.19% )
0.114222393 seconds time elapsed ( +- 1.19% )
~~~~~
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-3-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:33 +07:00
|
|
|
addr2line(dso_name, addr, NULL, NULL, dso, true, node, sym);
|
2017-03-26 03:34:26 +07:00
|
|
|
return node;
|
|
|
|
}
|
|
|
|
|
2013-09-11 12:09:32 +07:00
|
|
|
#else /* HAVE_LIBBFD_SUPPORT */
|
|
|
|
|
2017-03-26 03:34:25 +07:00
|
|
|
static int filename_split(char *filename, unsigned int *line_nr)
|
|
|
|
{
|
|
|
|
char *sep;
|
|
|
|
|
|
|
|
sep = strchr(filename, '\n');
|
|
|
|
if (sep)
|
|
|
|
*sep = '\0';
|
|
|
|
|
|
|
|
if (!strcmp(filename, "??:0"))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
sep = strchr(filename, ':');
|
|
|
|
if (sep) {
|
|
|
|
*sep++ = '\0';
|
|
|
|
*line_nr = strtoul(sep, NULL, 0);
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2014-12-16 13:19:06 +07:00
|
|
|
static int addr2line(const char *dso_name, u64 addr,
|
2013-12-03 14:23:07 +07:00
|
|
|
char **file, unsigned int *line_nr,
|
2015-09-02 01:47:19 +07:00
|
|
|
struct dso *dso __maybe_unused,
|
2017-03-26 03:34:26 +07:00
|
|
|
bool unwind_inlines __maybe_unused,
|
2017-10-10 03:32:57 +07:00
|
|
|
struct inline_node *node __maybe_unused,
|
|
|
|
struct symbol *sym __maybe_unused)
|
2013-09-11 12:09:28 +07:00
|
|
|
{
|
|
|
|
FILE *fp;
|
|
|
|
char cmd[PATH_MAX];
|
|
|
|
char *filename = NULL;
|
|
|
|
size_t len;
|
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
scnprintf(cmd, sizeof(cmd), "addr2line -e %s %016"PRIx64,
|
|
|
|
dso_name, addr);
|
|
|
|
|
|
|
|
fp = popen(cmd, "r");
|
|
|
|
if (fp == NULL) {
|
|
|
|
pr_warning("popen failed for %s\n", dso_name);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (getline(&filename, &len, fp) < 0 || !len) {
|
|
|
|
pr_warning("addr2line has no output for %s\n", dso_name);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2017-03-26 03:34:25 +07:00
|
|
|
ret = filename_split(filename, line_nr);
|
|
|
|
if (ret != 1) {
|
2013-09-11 12:09:28 +07:00
|
|
|
free(filename);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
2017-03-26 03:34:25 +07:00
|
|
|
*file = filename;
|
|
|
|
|
2013-09-11 12:09:28 +07:00
|
|
|
out:
|
|
|
|
pclose(fp);
|
|
|
|
return ret;
|
|
|
|
}
|
2013-12-03 14:23:07 +07:00
|
|
|
|
|
|
|
void dso__free_a2l(struct dso *dso __maybe_unused)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2017-03-26 03:34:26 +07:00
|
|
|
static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
|
2017-10-10 03:32:57 +07:00
|
|
|
struct dso *dso __maybe_unused,
|
|
|
|
struct symbol *sym)
|
2017-03-26 03:34:26 +07:00
|
|
|
{
|
|
|
|
FILE *fp;
|
|
|
|
char cmd[PATH_MAX];
|
|
|
|
struct inline_node *node;
|
|
|
|
char *filename = NULL;
|
2017-10-31 09:06:54 +07:00
|
|
|
char *funcname = NULL;
|
|
|
|
size_t filelen, funclen;
|
2017-03-26 03:34:26 +07:00
|
|
|
unsigned int line_nr = 0;
|
|
|
|
|
2017-10-31 09:06:54 +07:00
|
|
|
scnprintf(cmd, sizeof(cmd), "addr2line -e %s -i -f %016"PRIx64,
|
2017-03-26 03:34:26 +07:00
|
|
|
dso_name, addr);
|
|
|
|
|
|
|
|
fp = popen(cmd, "r");
|
|
|
|
if (fp == NULL) {
|
|
|
|
pr_err("popen failed for %s\n", dso_name);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
node = zalloc(sizeof(*node));
|
|
|
|
if (node == NULL) {
|
|
|
|
perror("not enough memory for the inline node");
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
INIT_LIST_HEAD(&node->val);
|
|
|
|
node->addr = addr;
|
|
|
|
|
2017-10-31 09:06:54 +07:00
|
|
|
/* addr2line -f generates two lines for each inlined functions */
|
|
|
|
while (getline(&funcname, &funclen, fp) != -1) {
|
2017-10-10 03:32:58 +07:00
|
|
|
char *srcline;
|
2017-10-31 09:06:54 +07:00
|
|
|
struct symbol *inline_sym;
|
|
|
|
|
2019-06-26 22:13:13 +07:00
|
|
|
strim(funcname);
|
2017-10-31 09:06:54 +07:00
|
|
|
|
|
|
|
if (getline(&filename, &filelen, fp) == -1)
|
|
|
|
goto out;
|
2017-10-10 03:32:57 +07:00
|
|
|
|
2017-10-31 09:06:53 +07:00
|
|
|
if (filename_split(filename, &line_nr) != 1)
|
2017-03-26 03:34:26 +07:00
|
|
|
goto out;
|
|
|
|
|
2017-10-10 03:32:58 +07:00
|
|
|
srcline = srcline_from_fileline(filename, line_nr);
|
2017-10-31 09:06:54 +07:00
|
|
|
inline_sym = new_inline_sym(dso, sym, funcname);
|
|
|
|
|
|
|
|
if (inline_list__append(inline_sym, srcline, node) != 0) {
|
|
|
|
free(srcline);
|
|
|
|
if (inline_sym && inline_sym->inlined)
|
|
|
|
symbol__delete(inline_sym);
|
2017-03-26 03:34:26 +07:00
|
|
|
goto out;
|
2017-10-31 09:06:54 +07:00
|
|
|
}
|
2017-03-26 03:34:26 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
pclose(fp);
|
2017-10-31 09:06:53 +07:00
|
|
|
free(filename);
|
2017-10-31 09:06:54 +07:00
|
|
|
free(funcname);
|
2017-03-26 03:34:26 +07:00
|
|
|
|
|
|
|
return node;
|
|
|
|
}
|
|
|
|
|
2013-09-11 12:09:32 +07:00
|
|
|
#endif /* HAVE_LIBBFD_SUPPORT */
|
2013-09-11 12:09:28 +07:00
|
|
|
|
2013-12-03 14:23:10 +07:00
|
|
|
/*
|
|
|
|
* Number of addr2line failures (without success) before disabling it for that
|
|
|
|
* dso.
|
|
|
|
*/
|
|
|
|
#define A2L_FAIL_LIMIT 123
|
|
|
|
|
2015-09-02 01:47:19 +07:00
|
|
|
char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
|
2017-12-29 23:26:52 +07:00
|
|
|
bool show_sym, bool show_addr, bool unwind_inlines,
|
|
|
|
u64 ip)
|
2013-09-11 12:09:28 +07:00
|
|
|
{
|
2013-10-10 10:51:31 +07:00
|
|
|
char *file = NULL;
|
|
|
|
unsigned line = 0;
|
2013-09-11 12:09:31 +07:00
|
|
|
char *srcline;
|
2013-12-11 01:19:23 +07:00
|
|
|
const char *dso_name;
|
2013-09-11 12:09:28 +07:00
|
|
|
|
2013-09-11 12:09:31 +07:00
|
|
|
if (!dso->has_srcline)
|
2014-11-13 09:05:24 +07:00
|
|
|
goto out;
|
2013-09-11 12:09:31 +07:00
|
|
|
|
2017-03-26 03:34:25 +07:00
|
|
|
dso_name = dso__name(dso);
|
|
|
|
if (dso_name == NULL)
|
2013-09-11 12:09:29 +07:00
|
|
|
goto out;
|
|
|
|
|
2017-10-10 03:32:57 +07:00
|
|
|
if (!addr2line(dso_name, addr, &file, &line, dso,
|
|
|
|
unwind_inlines, NULL, sym))
|
2013-09-11 12:09:29 +07:00
|
|
|
goto out;
|
2013-09-11 12:09:28 +07:00
|
|
|
|
2017-10-10 03:32:58 +07:00
|
|
|
srcline = srcline_from_fileline(file, line);
|
|
|
|
free(file);
|
|
|
|
|
|
|
|
if (!srcline)
|
2013-12-03 14:23:10 +07:00
|
|
|
goto out;
|
|
|
|
|
|
|
|
dso->a2l_fails = 0;
|
2013-09-11 12:09:28 +07:00
|
|
|
|
|
|
|
return srcline;
|
2013-09-11 12:09:31 +07:00
|
|
|
|
|
|
|
out:
|
2013-12-03 14:23:10 +07:00
|
|
|
if (dso->a2l_fails && ++dso->a2l_fails > A2L_FAIL_LIMIT) {
|
|
|
|
dso->has_srcline = 0;
|
|
|
|
dso__free_a2l(dso);
|
|
|
|
}
|
2017-03-19 04:49:28 +07:00
|
|
|
|
|
|
|
if (!show_addr)
|
|
|
|
return (show_sym && sym) ?
|
|
|
|
strndup(sym->name, sym->namelen) : NULL;
|
|
|
|
|
2014-11-13 09:05:27 +07:00
|
|
|
if (sym) {
|
2014-12-16 13:19:06 +07:00
|
|
|
if (asprintf(&srcline, "%s+%" PRIu64, show_sym ? sym->name : "",
|
2017-12-29 23:26:52 +07:00
|
|
|
ip - sym->start) < 0)
|
2014-11-13 09:05:27 +07:00
|
|
|
return SRCLINE_UNKNOWN;
|
2014-12-16 13:19:06 +07:00
|
|
|
} else if (asprintf(&srcline, "%s[%" PRIx64 "]", dso->short_name, addr) < 0)
|
2014-11-13 09:05:24 +07:00
|
|
|
return SRCLINE_UNKNOWN;
|
|
|
|
return srcline;
|
2013-09-11 12:09:28 +07:00
|
|
|
}
|
|
|
|
|
2018-12-04 07:18:48 +07:00
|
|
|
/* Returns filename and fills in line number in line */
|
|
|
|
char *get_srcline_split(struct dso *dso, u64 addr, unsigned *line)
|
|
|
|
{
|
|
|
|
char *file = NULL;
|
|
|
|
const char *dso_name;
|
|
|
|
|
|
|
|
if (!dso->has_srcline)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
dso_name = dso__name(dso);
|
|
|
|
if (dso_name == NULL)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
if (!addr2line(dso_name, addr, &file, line, dso, true, NULL, NULL))
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
dso->a2l_fails = 0;
|
|
|
|
return file;
|
|
|
|
|
|
|
|
out:
|
|
|
|
if (dso->a2l_fails && ++dso->a2l_fails > A2L_FAIL_LIMIT) {
|
|
|
|
dso->has_srcline = 0;
|
|
|
|
dso__free_a2l(dso);
|
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2013-09-11 12:09:28 +07:00
|
|
|
void free_srcline(char *srcline)
|
|
|
|
{
|
|
|
|
if (srcline && strcmp(srcline, SRCLINE_UNKNOWN) != 0)
|
|
|
|
free(srcline);
|
|
|
|
}
|
2015-09-02 01:47:19 +07:00
|
|
|
|
|
|
|
char *get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
|
2017-12-29 23:26:52 +07:00
|
|
|
bool show_sym, bool show_addr, u64 ip)
|
2015-09-02 01:47:19 +07:00
|
|
|
{
|
2017-12-29 23:26:52 +07:00
|
|
|
return __get_srcline(dso, addr, sym, show_sym, show_addr, false, ip);
|
2015-09-02 01:47:19 +07:00
|
|
|
}
|
2017-03-26 03:34:26 +07:00
|
|
|
|
perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:
Before:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches
52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
After:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches
22.609857253 seconds time elapsed
Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.
I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:34 +07:00
|
|
|
struct srcline_node {
|
|
|
|
u64 addr;
|
|
|
|
char *srcline;
|
|
|
|
struct rb_node rb_node;
|
|
|
|
};
|
|
|
|
|
2018-12-07 02:18:15 +07:00
|
|
|
void srcline__tree_insert(struct rb_root_cached *tree, u64 addr, char *srcline)
|
perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:
Before:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches
52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
After:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches
22.609857253 seconds time elapsed
Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.
I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:34 +07:00
|
|
|
{
|
2018-12-07 02:18:15 +07:00
|
|
|
struct rb_node **p = &tree->rb_root.rb_node;
|
perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:
Before:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches
52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
After:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches
22.609857253 seconds time elapsed
Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.
I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:34 +07:00
|
|
|
struct rb_node *parent = NULL;
|
|
|
|
struct srcline_node *i, *node;
|
2018-12-07 02:18:15 +07:00
|
|
|
bool leftmost = true;
|
perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:
Before:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches
52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
After:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches
22.609857253 seconds time elapsed
Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.
I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:34 +07:00
|
|
|
|
|
|
|
node = zalloc(sizeof(struct srcline_node));
|
|
|
|
if (!node) {
|
|
|
|
perror("not enough memory for the srcline node");
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
node->addr = addr;
|
|
|
|
node->srcline = srcline;
|
|
|
|
|
|
|
|
while (*p != NULL) {
|
|
|
|
parent = *p;
|
|
|
|
i = rb_entry(parent, struct srcline_node, rb_node);
|
|
|
|
if (addr < i->addr)
|
|
|
|
p = &(*p)->rb_left;
|
2018-12-07 02:18:15 +07:00
|
|
|
else {
|
perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:
Before:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches
52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
After:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches
22.609857253 seconds time elapsed
Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.
I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:34 +07:00
|
|
|
p = &(*p)->rb_right;
|
2018-12-07 02:18:15 +07:00
|
|
|
leftmost = false;
|
|
|
|
}
|
perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:
Before:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches
52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
After:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches
22.609857253 seconds time elapsed
Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.
I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:34 +07:00
|
|
|
}
|
|
|
|
rb_link_node(&node->rb_node, parent, p);
|
2018-12-07 02:18:15 +07:00
|
|
|
rb_insert_color_cached(&node->rb_node, tree, leftmost);
|
perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:
Before:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches
52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
After:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches
22.609857253 seconds time elapsed
Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.
I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:34 +07:00
|
|
|
}
|
|
|
|
|
2018-12-07 02:18:15 +07:00
|
|
|
char *srcline__tree_find(struct rb_root_cached *tree, u64 addr)
|
perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:
Before:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches
52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
After:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches
22.609857253 seconds time elapsed
Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.
I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:34 +07:00
|
|
|
{
|
2018-12-07 02:18:15 +07:00
|
|
|
struct rb_node *n = tree->rb_root.rb_node;
|
perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:
Before:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches
52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
After:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches
22.609857253 seconds time elapsed
Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.
I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:34 +07:00
|
|
|
|
|
|
|
while (n) {
|
|
|
|
struct srcline_node *i = rb_entry(n, struct srcline_node,
|
|
|
|
rb_node);
|
|
|
|
|
|
|
|
if (addr < i->addr)
|
|
|
|
n = n->rb_left;
|
|
|
|
else if (addr > i->addr)
|
|
|
|
n = n->rb_right;
|
|
|
|
else
|
|
|
|
return i->srcline;
|
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2018-12-07 02:18:15 +07:00
|
|
|
void srcline__tree_delete(struct rb_root_cached *tree)
|
perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:
Before:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches
52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
After:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches
22.609857253 seconds time elapsed
Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.
I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:34 +07:00
|
|
|
{
|
|
|
|
struct srcline_node *pos;
|
2018-12-07 02:18:15 +07:00
|
|
|
struct rb_node *next = rb_first_cached(tree);
|
perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:
Before:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches
52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
After:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches
22.609857253 seconds time elapsed
Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.
I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:34 +07:00
|
|
|
|
|
|
|
while (next) {
|
|
|
|
pos = rb_entry(next, struct srcline_node, rb_node);
|
|
|
|
next = rb_next(&pos->rb_node);
|
2018-12-07 02:18:15 +07:00
|
|
|
rb_erase_cached(&pos->rb_node, tree);
|
perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:
Before:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches
52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
After:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches
22.609857253 seconds time elapsed
Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.
I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:34 +07:00
|
|
|
free_srcline(pos->srcline);
|
|
|
|
zfree(&pos);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-10-10 03:32:57 +07:00
|
|
|
struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
|
|
|
|
struct symbol *sym)
|
2017-03-26 03:34:26 +07:00
|
|
|
{
|
|
|
|
const char *dso_name;
|
|
|
|
|
|
|
|
dso_name = dso__name(dso);
|
|
|
|
if (dso_name == NULL)
|
|
|
|
return NULL;
|
|
|
|
|
2017-10-10 03:32:57 +07:00
|
|
|
return addr2inlines(dso_name, addr, dso, sym);
|
2017-03-26 03:34:26 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
void inline_node__delete(struct inline_node *node)
|
|
|
|
{
|
|
|
|
struct inline_list *ilist, *tmp;
|
|
|
|
|
|
|
|
list_for_each_entry_safe(ilist, tmp, &node->val, list) {
|
|
|
|
list_del_init(&ilist->list);
|
2017-10-10 03:32:58 +07:00
|
|
|
free_srcline(ilist->srcline);
|
2017-10-10 03:32:57 +07:00
|
|
|
/* only the inlined symbols are owned by the list */
|
|
|
|
if (ilist->symbol && ilist->symbol->inlined)
|
|
|
|
symbol__delete(ilist->symbol);
|
2017-03-26 03:34:26 +07:00
|
|
|
free(ilist);
|
|
|
|
}
|
|
|
|
|
|
|
|
free(node);
|
|
|
|
}
|
2017-10-10 03:32:59 +07:00
|
|
|
|
2018-12-07 02:18:15 +07:00
|
|
|
void inlines__tree_insert(struct rb_root_cached *tree,
|
|
|
|
struct inline_node *inlines)
|
2017-10-10 03:32:59 +07:00
|
|
|
{
|
2018-12-07 02:18:15 +07:00
|
|
|
struct rb_node **p = &tree->rb_root.rb_node;
|
2017-10-10 03:32:59 +07:00
|
|
|
struct rb_node *parent = NULL;
|
|
|
|
const u64 addr = inlines->addr;
|
|
|
|
struct inline_node *i;
|
2018-12-07 02:18:15 +07:00
|
|
|
bool leftmost = true;
|
2017-10-10 03:32:59 +07:00
|
|
|
|
|
|
|
while (*p != NULL) {
|
|
|
|
parent = *p;
|
|
|
|
i = rb_entry(parent, struct inline_node, rb_node);
|
|
|
|
if (addr < i->addr)
|
|
|
|
p = &(*p)->rb_left;
|
2018-12-07 02:18:15 +07:00
|
|
|
else {
|
2017-10-10 03:32:59 +07:00
|
|
|
p = &(*p)->rb_right;
|
2018-12-07 02:18:15 +07:00
|
|
|
leftmost = false;
|
|
|
|
}
|
2017-10-10 03:32:59 +07:00
|
|
|
}
|
|
|
|
rb_link_node(&inlines->rb_node, parent, p);
|
2018-12-07 02:18:15 +07:00
|
|
|
rb_insert_color_cached(&inlines->rb_node, tree, leftmost);
|
2017-10-10 03:32:59 +07:00
|
|
|
}
|
|
|
|
|
2018-12-07 02:18:15 +07:00
|
|
|
struct inline_node *inlines__tree_find(struct rb_root_cached *tree, u64 addr)
|
2017-10-10 03:32:59 +07:00
|
|
|
{
|
2018-12-07 02:18:15 +07:00
|
|
|
struct rb_node *n = tree->rb_root.rb_node;
|
2017-10-10 03:32:59 +07:00
|
|
|
|
|
|
|
while (n) {
|
|
|
|
struct inline_node *i = rb_entry(n, struct inline_node,
|
|
|
|
rb_node);
|
|
|
|
|
|
|
|
if (addr < i->addr)
|
|
|
|
n = n->rb_left;
|
|
|
|
else if (addr > i->addr)
|
|
|
|
n = n->rb_right;
|
|
|
|
else
|
|
|
|
return i;
|
|
|
|
}
|
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2018-12-07 02:18:15 +07:00
|
|
|
void inlines__tree_delete(struct rb_root_cached *tree)
|
2017-10-10 03:32:59 +07:00
|
|
|
{
|
|
|
|
struct inline_node *pos;
|
2018-12-07 02:18:15 +07:00
|
|
|
struct rb_node *next = rb_first_cached(tree);
|
2017-10-10 03:32:59 +07:00
|
|
|
|
|
|
|
while (next) {
|
|
|
|
pos = rb_entry(next, struct inline_node, rb_node);
|
|
|
|
next = rb_next(&pos->rb_node);
|
2018-12-07 02:18:15 +07:00
|
|
|
rb_erase_cached(&pos->rb_node, tree);
|
2017-10-10 03:32:59 +07:00
|
|
|
inline_node__delete(pos);
|
|
|
|
}
|
|
|
|
}
|