linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2025-01-14 02:16:09 +07:00

Author	SHA1	Message	Date
Alexey Budankov	9c080c0279	perf mmap: Declare type for cpu mask of arbitrary length Declare a dedicated struct map_cpu_mask type for cpu masks of arbitrary length. The mask is available thru bits pointer and the mask length is kept in nbits field. MMAP_CPU_MASK_BYTES() macro returns mask storage size in bytes. The mmap_cpu_mask__scnprintf() function can be used to log text representation of the mask. Committer notes: To print the 'nbits' struct member we must use %zd, since it is a size_t, this fixes the build in some toolchains/arches. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/0fd2454f-477f-d15a-f4ee-79bcbd2585ff@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-01-06 11:46:09 -03:00
Yuya Fujita	55347ec340	perf hists: Fix variable name's inconsistency in hists__for_each() macro Variable names are inconsistent in hists__for_each macro(). Due to this inconsistency, the macro replaces its second argument with "fmt" regardless of its original name. So far it works because only "fmt" is passed to the second argument. However, this behavior is not expected and should be fixed. Fixes: `f0786af536` ("perf hists: Introduce hists__for_each_format macro") Fixes: `aa6f50af82` ("perf hists: Introduce hists__for_each_sort_list macro") Signed-off-by: Yuya Fujita <fujita.yuya@fujitsu.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/OSAPR01MB1588E1C47AC22043175DE1B2E8520@OSAPR01MB1588.jpnprd01.prod.outlook.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-12-20 18:58:13 -03:00
Arnaldo Carvalho de Melo	a75af86b6f	perf map: Set kmap->kmaps backpointer for main kernel map chunks When a map is create to represent the main kernel area (vmlinux) with map__new2() we allocate an extra area to store a pointer to the 'struct maps' for the kernel maps, so that we can access that struct when loading ELF files or kallsyms, as we will need to split it in multiple maps, one per kernel module or ELF section (such as ".init.text"). So when map->dso->kernel is non-zero, it is expected that map__kmap(map)->kmaps to be set to the tree of kernel maps (modules, chunks of the main kernel, bpf progs put in place via PERF_RECORD_KSYMBOL, the main kernel). This was not the case when we were splitting the main kernel into chunks for its ELF sections, which ended up making 'perf report --children' processing a perf.data file with callchains to trip on __map__is_kernel(), when we press ENTER to see the popup menu for main histogram entries that starts at a symbol in the ".init.text" ELF section, e.g.: - 8.83% 0.00% swapper [kernel.vmlinux].init.text [k] start_kernel start_kernel cpu_startup_entry do_idle cpuidle_enter cpuidle_enter_state intel_idle Fix it. Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/20191218190120.GB13282@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-12-20 18:55:40 -03:00
Jin Yao	0feba17bd7	perf report: Fix incorrectly added dimensions as switch perf data file We observed an issue that was some extra columns displayed after switching perf data file in browser. The steps to reproduce: 1. perf record -a -e cycles,instructions -- sleep 3 2. perf report --group 3. In browser, we use hotkey 's' to switch to another perf.data 4. Now in browser, the extra columns 'Self' and 'Children' are displayed. The issue is setup_sorting() executed again after repeat path, so dimensions are added again. This patch checks the last key returned from __cmd_report(). If it's K_SWITCH_INPUT_DATA, skips the setup_sorting(). Fixes: `ad0de0971b` ("perf report: Enable the runtime switching of perf data file") Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191220013722.20592-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-12-20 18:49:27 -03:00
Ed Maste	58b3bafff8	perf vendor events s390: Remove name from L1D_RO_EXCL_WRITES description In `7fcfa9a2d9` an unintended prefix "Counter:18 Name:" was removed from the description for L1D_RO_EXCL_WRITES, but the extra name remained in the description. Remove it too. Fixes: `7fcfa9a2d9` ("perf list: Fix s390 counter long description for L1D_RO_EXCL_WRITES") Signed-off-by: Ed Maste <emaste@freebsd.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Vincent Chen <deanbo422@gmail.com> Link: http://lore.kernel.org/lkml/20191212145346.5026-1-emaste@freefall.freebsd.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-12-16 13:40:26 -03:00
Ed Maste	28396b7df0	perf vendor events s390: Fix counter long description for DTLB1_GPAGE_WRITES The cf_z13 counter DTLB1_GPAGE_WRITES included a prefix 'Counter:132\tName:'. This is incorrect; remove the prefix as with `7fcfa9a2d9` for cf_z14. Signed-off-by: Ed Maste <emaste@freebsd.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Vincent Chen <deanbo422@gmail.com> Link: http://lore.kernel.org/lkml/20191212143446.88582-1-emaste@freefall.freebsd.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-12-16 13:40:26 -03:00
Michael Petlan	2870782687	perf header: Fix false warning when there are no duplicate cache entries Before this patch, perf expected that there might be NPROC4 unique cache entries at max, however, it also expected that some of them would be shared and/or of the same size, thus the final number of entries would be reduced to be lower than NPROC4. In case the number of entries hadn't been reduced (was NPROC4), the warning was printed. However, some systems might have unusual cache topology, such as the following two-processor KVM guest: cpu level shared_cpu_list size 0 1 0 32K 0 1 0 64K 0 2 0 512K 0 3 0 8192K 1 1 1 32K 1 1 1 64K 1 2 1 512K 1 3 1 8192K This KVM guest has 8 (NPROC4) unique cache entries, which used to make perf printing the message, although there actually aren't "way too many cpu caches". v2: Removing unused argument. v3: Unifying the way we obtain number of cpus. v4: Removed '& UINT_MAX' construct which is redundant. Signed-off-by: Michael Petlan <mpetlan@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> LPU-Reference: 20191208162056.20772-1-mpetlan@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-12-11 12:28:14 -03:00
Kajol Jain	eb573e746b	perf metricgroup: Fix printing event names of metric group with multiple events Commit `f01642e491` ("perf metricgroup: Support multiple events for metricgroup") introduced support for multiple events in a metric group. But with the current upstream, metric events names are not printed properly In power9 platform: command:# ./perf stat --metric-only -M translation -C 0 -I 1000 sleep 2 1.000208486 2.000368863 2.001400558 Similarly in skylake platform: command:./perf stat --metric-only -M Power -I 1000 1.000579994 2.002189493 With current upstream version, issue is with event name comparison logic in find_evsel_group(). Current logic is to compare events belonging to a metric group to the events in perf_evlist. Since the break statement is missing in the loop used for comparison between metric group and perf_evlist events, the loop continues to execute even after getting a pattern match, and end up in discarding the matches. Incase of single metric event belongs to metric group, its working fine, because in case of single event once it compare all events it reaches to end of perf_evlist. Example for single metric event in power9 platform: command:# ./perf stat --metric-only -M branches_per_inst -I 1000 sleep 1 1.000094653 0.2 1.001337059 0.0 This patch fixes the issue by making sure once we found all events belongs to that metric event matched in find_evsel_group(), we successfully break from that loop by adding corresponding condition. With this patch: In power9 platform: command:# ./perf stat --metric-only -M translation -C 0 -I 1000 sleep 2 result:# time derat_4k_miss_rate_percent derat_4k_miss_ratio derat_miss_ratio derat_64k_miss_rate_percent derat_64k_miss_ratio dslb_miss_rate_percent islb_miss_rate_percent 1.000135672 0.0 0.3 1.0 0.0 0.2 0.0 0.0 2.000380617 0.0 0.0 0.0 0.0 0.0 0.0 0.0 command:# ./perf stat --metric-only -M Power -I 1000 Similarly in skylake platform: result:# time Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency 1.000563580 0.3 0.0 2.6 44.2 21.9 0.0 0.0 0.0 2.002235027 0.4 0.0 2.7 43.0 20.7 0.0 0.0 0.0 Committer testing: Before: [root@seventh ~]# perf stat --metric-only -M Power -I 1000 # time 1.000383223 2.001168182 3.001968545 4.002741200 5.003442022 ^C 5.777687244 [root@seventh ~]# After the patch: [root@seventh ~]# perf stat --metric-only -M Power -I 1000 # time Turbo_Utilization C3_Core_Residency C6_Core_Residency C7_Core_Residency C2_Pkg_Residency C3_Pkg_Residency C6_Pkg_Residency C7_Pkg_Residency 1.000406577 0.4 0.1 1.4 97.0 0.0 0.0 0.0 0.0 2.001481572 0.3 0.0 0.6 97.9 0.0 0.0 0.0 0.0 3.002332585 0.2 0.0 1.0 97.5 0.0 0.0 0.0 0.0 4.003196624 0.2 0.0 0.3 98.6 0.0 0.0 0.0 0.0 5.004063851 0.3 0.0 0.7 97.7 0.0 0.0 0.0 0.0 ^C 5.471260276 0.2 0.0 0.5 49.3 0.0 0.0 0.0 0.0 [root@seventh ~]# [root@seventh ~]# dmesg \| grep -i skylake [ 0.187807] Performance Events: PEBS fmt3+, Skylake events, 32-deep LBR, full-width counters, Intel PMU driver. [root@seventh ~]# Fixes: `f01642e491` ("perf metricgroup: Support multiple events for metricgroup") Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191120084059.24458-1-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-12-11 12:28:14 -03:00
Ravi Bangoria	0dd674efaf	perf/x86/pmu-events: Fix Kernel_Utilization metric Kernel Utilization should divide ref cycles spent in kernel with total ref cycles. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Haiyan Song <haiyanx.song@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20191204162121.29998-1-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-12-11 12:28:14 -03:00
Arnaldo Carvalho de Melo	61208e6e10	perf top: Do not bail out when perf_env__read_cpuid() returns ENOSYS 'perf top' stopped working on hw architectures that do not provide a get_cpuid() implementation and thus fallback to the weak get_cpuid() default function. This is done because at annotation time we may need it in the arch specific annotation init routine, but that is only being used by arches that do provide a get_cpuid() implementation: $ find tools/ -name ".[ch]" \| xargs grep 'evlist->env' tools/perf/builtin-top.c: top.evlist->env = &perf_env; tools/perf/util/evsel.c: return evsel->evlist->env; tools/perf/util/s390-cpumsf.c: sf->machine_type = s390_cpumsf_get_type(session->evlist->env->cpuid); tools/perf/util/header.c: session->evlist->env = &header->env; tools/perf/util/sample-raw.c: const char arch_pf = perf_env__arch(evlist->env); $ $ find tools/perf/arch -name ".[ch]" \| xargs grep -w get_cpuid tools/perf/arch/x86/util/auxtrace.c: ret = get_cpuid(buffer, sizeof(buffer)); tools/perf/arch/x86/util/header.c:get_cpuid(char buffer, size_t sz) tools/perf/arch/powerpc/util/header.c:get_cpuid(char buffer, size_t sz) tools/perf/arch/s390/util/header.c: Implementation of get_cpuid(). tools/perf/arch/s390/util/header.c:int get_cpuid(char *buffer, size_t sz) tools/perf/arch/s390/util/header.c: if (buf && get_cpuid(buf, 128)) $ For 'report' or 'script', i.e. tools working on perf.data files, that is setup while reading the header, its just top that needs to explicitely read it at tool start. Fixes: `608127f737` ("perf top: Initialize perf_env->cpuid, needed by the per arch annotation init routine") Reported-by: John Garry <john.garry@huawei.com> Analysed-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Tested-by: John Garry <john.garry@huawei.com> # arm64 Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/n/tip-lxwjr0cd2eggzx04a780ffrv@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-12-11 12:26:25 -03:00
Arnaldo Carvalho de Melo	05267c7eac	perf arch: Make the default get_cpuid() return compatible error Some of the functions calling get_cpuid() propagate back the error it returns, and all are using errno (positive) values, make the weak default get_cpuid() function return ENOSYS to be consistent and to allow checking if this is an arch not providing this function or if a provided one is having trouble getting the cpuid, to decide if the warning should be provided to the user or just a debug message should be emitted. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Tested-by: John Garry <john.garry@huawei.com> # arm64 Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/n/tip-lxwjr0cd2eggzx04a780ffrv@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-12-11 12:25:14 -03:00
Arnaldo Carvalho de Melo	761bfc33dd	Merge remote-tracking branch 'torvalds/master' into perf/urgent To pick up BPF fixes to allow a clean 'make -C tools/perf build-test': `7c3977d1e8` libbpf: Fix sym->st_value print on 32-bit arches `1fd450f992` libbpf: Fix up generation of bpf_helper_defs.h Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-12-11 09:58:16 -03:00
Adrian Hunter	29f6eeca0e	perf inject: Fix processing of ID index for injected instruction tracing The ID index event is used when decoding, but can result in the following error: $ perf record --aux-sample -e '{intel_pt//,branch-misses}:u' ls $ perf inject -i perf.data -o perf.data.inj --itrace=be $ perf script -i perf.data.inj 0x1020 [0x410]: failed to process type: 69 [No such file or directory] Fix by having 'perf inject' drop the ID index event. Fixes: `c0a6de06c4` ("perf record: Add support for AUX area sampling") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191204120800.8138-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-12-04 12:39:53 -03:00
Ravi Bangoria	bb30acae4c	perf report: Bail out --mem-mode if mem info is not available If perf.data is recorded without -d, don't allow user to use --mem-mode with 'perf report'. symbol_daddr and phys_daddr can be recorded separately and may be present in the perf.data but at the report time they are associated with mem-mode fields and thus this restriction applies to them as well. Before: $ perf record ls $ perf report --mem-mode --stdio # Overhead Local Weight Memory access Symbol # ........ ............ ............. ....................... 55.56% 0 N/A [k] 0xffffffff81a00ae7 After: $ perf report --mem-mode --stdio Error: Selected --mem-mode but no mem data. Did you call perf record without -d? Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20191114132213.5419-4-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-12-04 12:34:02 -03:00
Ravi Bangoria	aa6b3c9923	perf report: Make -F more strict like -s Currently -F allows branch-mode / mem-mode fields with -F even when perf report is not running in that mode. Don't allow that. Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20191114132213.5419-3-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-12-04 12:32:40 -03:00
Ravi Bangoria	ae87405fb5	perf report/top TUI: Replace pr_err() with ui__error() pr_err() in TUI mode does not print anyting on the screen and just quits. Replace such pr_err() with ui__error(). Before: $ perf report -s + $ After: $ perf report -s + ┌─Error:────────────────┐ │Invalid --sort key: `+'│ │ │ │Press any key... │ └───────────────────────┘ Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20191114132213.5419-2-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-12-04 12:27:14 -03:00
David S. Miller	734c7022ad	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2019-12-02 The following pull-request contains BPF updates for your net tree. We've added 10 non-merge commits during the last 6 day(s) which contain a total of 10 files changed, 60 insertions(+), 51 deletions(-). The main changes are: 1) Fix vmlinux BTF generation for binutils pre v2.25, from Stanislav Fomichev. 2) Fix libbpf global variable relocation to take symbol's st_value offset into account, from Andrii Nakryiko. 3) Fix libbpf build on powerpc where check_abi target fails due to different readelf output format, from Aurelien Jarno. 4) Don't set BPF insns RO for the case when they are JITed in order to avoid fragmenting the direct map, from Daniel Borkmann. 5) Fix static checker warning in btf_distill_func_proto() as well as a build error due to empty enum when BPF is compiled out, from Alexei Starovoitov. 6) Fix up generation of bpf_helper_defs.h for perf, from Arnaldo Carvalho de Melo. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-02 10:50:29 -08:00
Arnaldo Carvalho de Melo	9974406884	perf kvm: Clarify the 'perf kvm' -i and -o command line options The 'perf kvm' subcommand has options that it in turn passes to other perf subcommands such as 'report' and 'record', particularly -i and -o end up setting the same variable that will then be used for 'record's -o and report '-i', which ends up being confusing, leading some to think that both -i and -o can be used with 'report'. Improve the man page to state that -i is used with the post-processing subcommands while -o is used just with 'record' and that to save the output of 'report' one should simply redirect its output to a file. Noticed while reading the https://www.linux-kvm.org/page/Perf_events page. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steve Dickson <steved@redhat.com> Cc: William Cohen <wcohen@redhat.com> Link: https://lkml.kernel.org/n/tip-tclbttvmgtm525fvmh85f7d9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-12-02 15:38:59 -03:00
Arnaldo Carvalho de Melo	f6661125ff	perf beauty: Add CLEAR_SIGHAND support for clone's flags arg Add support for the recently added CLONE_CLEAR_SIGHAND flag. This takes advantage of the copy of the uapi/linux/sched.h we have in tools/include, which allows us to build tools/perf in older systems and have the binary support printing that flag whenever that system gets its kernel updated to one where this feature is present. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Adrian Reber <areber@redhat.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org Link: https://lkml.kernel.org/n/tip-1vnz497ubtu5oz16ygdcul0e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-12-02 15:19:52 -03:00
Arnaldo Carvalho de Melo	bd5c6b81dd	perf bench: Update the copies of x86's mem{cpy,set}_64.S And update linux/linkage.h, which requires in turn that we make these files switch from ENTRY()/ENDPROC() to SYM_FUNC_START()/SYM_FUNC_END(): tools/perf/arch/arm64/tests/regs_load.S tools/perf/arch/arm/tests/regs_load.S tools/perf/arch/powerpc/tests/regs_load.S tools/perf/arch/x86/tests/regs_load.S We also need to switch SYM_FUNC_START_LOCAL() to SYM_FUNC_START() for the functions used directly by 'perf bench', and update tools/perf/check_headers.sh to ignore those changes when checking if the kernel original files drifted from the copies we carry. This is to get the changes from: `6dcc5627f6` ("x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_*") `ef1e03152c` ("x86/asm: Make some functions local") `e9b9d020c4` ("x86/asm: Annotate aliases") And address these tools/perf build warnings: Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S' diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S Warning: Kernel ABI header at 'tools/arch/x86/lib/memset_64.S' differs from latest version at 'arch/x86/lib/memset_64.S' diff -u tools/arch/x86/lib/memset_64.S arch/x86/lib/memset_64.S Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-tay3l8x8k11p7y3qcpqh9qh5@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-12-02 11:40:57 -03:00
Arnaldo Carvalho de Melo	77b91c1a52	perf machine: Fill map_symbol->maps in append_inlines() to fix segfault I forgot to fill in the map_symbol->maps field in append_inlines() which then makes code down the line segfault when trying to deref it. It doesn't make any sense to have an addr_location with its 'map' member not NULL while its 'maps' is NULL, after all al->maps is where al->map is in. It is done that way so that we don't have to have in each 'struct map' a pointer to the 'struct maps' it is in, as we had in the past when we would have 'map->mg', before 'struct maps' was combined with 'struct map_groups', because there was always a one-to-one relationship for these structs. This fixes a segfault when processing DWARF callgraphs in 'perf report'. Reported-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: `08f6680e62` ("perf tools: Add a 'struct map_groups' pointer to 'struct map_symbol'") Link: http://lore.kernel.org/lkml/20191129160631.GD26963@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-29 16:11:06 -03:00
Ian Rogers	fa7f7e7354	perf jit: Move test functionality in to a test Adds a test for minimal jit_write_elf functionality. Committer testing: # perf test jit 61: Test jit_write_elf : Ok # # perf test -v jit 61: Test jit_write_elf : --- start --- test child forked, pid 10460 Writing jit code to: /tmp/perf-test-KqxURR test child finished with 0 ---- end ---- Test jit_write_elf: Ok # Committer notes: Fix up the case where HAVE_JITDUMP is no defined. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexios Zavras <alexios.zavras@intel.com> Cc: Allison Randal <allison@lohutok.net> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20191126235913.41855-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-29 12:20:45 -03:00
Andi Kleen	704e2f5b70	perf stat: Use affinity for enabling/disabling events Restructure event enabling/disabling to use affinity, which minimizes the number of IPIs needed. Before on a large test case with 94 CPUs: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 54.65 1.899986 22 84812 660 ioctl after: 39.21 0.930451 10 84796 644 ioctl Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-13-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-29 12:20:45 -03:00
Andi Kleen	363fb12189	perf evsel: Add functions to enable/disable for a specific CPU Refactor the existing functions to use these functions internally. Used in the next patch. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-12-andi@firstfloor.org Link: http://lore.kernel.org/lkml/20191127232657.GL84886@tassilo.jf.intel.com # Fix Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-29 12:20:45 -03:00
Andi Kleen	4b49ab708d	perf stat: Use affinity for reading Restructure event reading to use affinity to minimize the number of IPIs needed. Before on a large test case with 94 CPUs: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 3.16 0.106079 4 22082 read After: 3.43 0.081295 3 22082 read Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-11-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-29 12:20:45 -03:00
Andi Kleen	4804e01116	perf stat: Use affinity for opening events Restructure the event opening in perf stat to cycle through the events by CPU after setting affinity to that CPU. This eliminates IPI overhead in the perf API. We have to loop through the CPU in the outter builtin-stat code instead of leaving that to low level functions. It has to change the weak group fallback strategy slightly. Since we cannot easily undo the opens for other CPUs move the weak group retry to a separate loop. Before with a large test case with 94 CPUs: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 42.75 4.050910 67 60046 110 perf_event_open After: 26.86 0.944396 16 58069 110 perf_event_open (the number changes slightly because the weak group retries work differently and the test case relies on weak groups) Committer notes: Added one of the hunks in a patch provided by Andi after I noticed that the "event times" 'perf test' entry was segfaulting. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-10-andi@firstfloor.org Link: http://lore.kernel.org/lkml/20191127232657.GL84886@tassilo.jf.intel.com # Fix Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-29 12:20:45 -03:00
Andi Kleen	e0e6a6ca3a	perf stat: Factor out open error handling Factor out the open error handling into a separate function. This is useful for followon patches who need to duplicate this. No behavior change intended. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-9-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-29 12:20:45 -03:00
Andi Kleen	7736627b86	perf stat: Use affinity for closing file descriptors Closing a perf fd can also trigger an IPI to the target CPU. Use the same affinity technique as we use for reading/enabling events to closing to optimize the CPU transitions. Before on a large test case with 94 CPUs: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 32.56 3.085463 50 61483 close After: 10.54 0.735704 11 61485 close Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-8-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-29 12:20:45 -03:00
Andi Kleen	99d6141d67	perf evsel: Add functions to close evsel on a CPU Refactor the existing all CPU function to use the per CPU close internally. Export APIs to close per CPU. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-7-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-29 12:20:45 -03:00
Andi Kleen	a8cbe40fe9	perf evsel: Add iterator to iterate over events ordered by CPU Add some common code that is needed to iterate over all events in CPU order. Used in followon patches Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-6-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-29 12:20:45 -03:00
Andi Kleen	a2408a7036	perf evlist: Maintain evlist->all_cpus Maintain a cpumap in the evlist that is the union of all the cpus of the events. This needs a cpumap merge operation, which is added together with tests. v2: Add tests for cpu map merge Fix handling of duplicates Rename _update to _merge Factor out sorting. Fix handling of NULL maps in merge v3: Add comments and empty lines to _merge Committer testing: # perf test "Merge cpu map" 52: Merge cpu map : Ok # Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lore.kernel.org/lkml/20191121001522.180827-5-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-29 12:20:45 -03:00
Andi Kleen	7074674e73	perf cpumap: Maintain cpumaps ordered and without dups Enforce this in _trim() Needed for followon change. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-4-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-29 12:20:45 -03:00
Adrian Hunter	5172672da0	perf script: Fix invalid LBR/binary mismatch error The 'len' returned by grab_bb() includes an extra MAXINSN bytes to allow for the last instruction, so the the final 'offs' will not be 'len'. Fix the error condition logic accordingly. Before: $ perf record -e '{intel_pt//,cpu/mem_inst_retired.all_loads,aux-sample-size=8192/pp}:u' grep -rqs jhgjhg /boot [ perf record: Woken up 19 times to write data ] [ perf record: Captured and wrote 2.274 MB perf.data ] $ perf script -F +brstackinsn --xed --itrace=i1usl100 \| head grep 13759 [002] 8091.310257: 1862 instructions:uH: 5641d58069eb bmexec+0x86b (/bin/grep) bmexec+2485: 00005641d5806b35 jnz 0x5641d5806bd0 # MISPRED 00005641d5806bd0 movzxb (%r13,%rdx,1), %eax 00005641d5806bd6 add %rdi, %rax 00005641d5806bd9 movzxb -0x1(%rax), %edx 00005641d5806bdd cmp %rax, %r14 00005641d5806be0 jnb 0x5641d58069c0 # MISPRED mismatch of LBR data and executable 00005641d58069c0 movzxb (%r13,%rdx,1), %edi After: $ perf script -F +brstackinsn --xed --itrace=i1usl100 \| head grep 13759 [002] 8091.310257: 1862 instructions:uH: 5641d58069eb bmexec+0x86b (/bin/grep) bmexec+2485: 00005641d5806b35 jnz 0x5641d5806bd0 # MISPRED 00005641d5806bd0 movzxb (%r13,%rdx,1), %eax 00005641d5806bd6 add %rdi, %rax 00005641d5806bd9 movzxb -0x1(%rax), %edx 00005641d5806bdd cmp %rax, %r14 00005641d5806be0 jnb 0x5641d58069c0 # MISPRED 00005641d58069c0 movzxb (%r13,%rdx,1), %edi 00005641d58069c6 add %rax, %rdi Fixes: `e98df280bc` ("perf script brstackinsn: Fix recovery from LBR/binary mismatch") Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191127095631.15663-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-28 08:08:38 -03:00
Adrian Hunter	0cd032d3b5	perf script: Fix brstackinsn for AUXTRACE brstackinsn must be allowed to be set by the user when AUX area data has been captured because, in that case, the branch stack might be synthesized on the fly. This fixes the following error: Before: $ perf record -e '{intel_pt//,cpu/mem_inst_retired.all_loads,aux-sample-size=8192/pp}:u' grep -rqs jhgjhg /boot [ perf record: Woken up 19 times to write data ] [ perf record: Captured and wrote 2.274 MB perf.data ] $ perf script -F +brstackinsn --xed --itrace=i1usl100 \| head Display of branch stack assembler requested, but non all-branch filter set Hint: run 'perf record -b ...' After: $ perf record -e '{intel_pt//,cpu/mem_inst_retired.all_loads,aux-sample-size=8192/pp}:u' grep -rqs jhgjhg /boot [ perf record: Woken up 19 times to write data ] [ perf record: Captured and wrote 2.274 MB perf.data ] $ perf script -F +brstackinsn --xed --itrace=i1usl100 \| head grep 13759 [002] 8091.310257: 1862 instructions:uH: 5641d58069eb bmexec+0x86b (/bin/grep) bmexec+2485: 00005641d5806b35 jnz 0x5641d5806bd0 # MISPRED 00005641d5806bd0 movzxb (%r13,%rdx,1), %eax 00005641d5806bd6 add %rdi, %rax 00005641d5806bd9 movzxb -0x1(%rax), %edx 00005641d5806bdd cmp %rax, %r14 00005641d5806be0 jnb 0x5641d58069c0 # MISPRED mismatch of LBR data and executable 00005641d58069c0 movzxb (%r13,%rdx,1), %edi Fixes: `48d02a1d5c` ("perf script: Add 'brstackinsn' for branch stacks") Reported-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191127095322.15417-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-28 08:08:38 -03:00
Andi Kleen	267ed5d859	perf affinity: Add infrastructure to save/restore affinity The kernel perf subsystem has to IPI to the target CPU for many operations. On systems with many CPUs and when managing many events the overhead can be dominated by lots of IPIs. An alternative is to set up CPU affinity in the perf tool, then set up all the events for that CPU, and then move on to the next CPU. Add some affinity management infrastructure to enable such a model. Used in followon patches. Committer notes: Use zfree() in some places, add missing stdbool.h header, some minor coding style changes. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-3-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-28 08:08:38 -03:00
Andi Kleen	d96645821e	perf pmu: Use file system cache to optimize sysfs access pmu.c does a lot of redundant /sys accesses while parsing aliases and probing for PMUs. On large systems with a lot of PMUs this can get expensive (>2s): % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 27.25 1.227847 8 160888 16976 openat 26.42 1.190481 7 164224 164077 stat Add a cache to remember if specific file names exist or don't exist, which eliminates most of this overhead. Also optimize some stat() calls to be slightly cheaper access() Resulting in: 0.18 0.004166 2 1851 305 open 0.08 0.001970 2 829 622 access Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20191121001522.180827-2-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-28 08:08:38 -03:00
Arnaldo Carvalho de Melo	5b596e0ff0	perf regs: Make perf_reg_name() return "unknown" instead of NULL To avoid breaking the build on arches where this is not wired up, at least all the other features should be made available and when using this specific routine, the "unknown" should point the user/developer to the need to wire this up on this particular hardware architecture. Detected in a container mipsel debian cross build environment, where it shows up as: In file included from /usr/mipsel-linux-gnu/include/stdio.h:867, from /git/linux/tools/perf/lib/include/perf/cpumap.h:6, from util/session.c:13: In function 'printf', inlined from 'regs_dump__printf' at util/session.c:1103:3, inlined from 'regs__printf' at util/session.c:1131:2: /usr/mipsel-linux-gnu/include/bits/stdio2.h:107:10: error: '%-5s' directive argument is null [-Werror=format-overflow=] 107 \| return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ()); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cross compiler details: mipsel-linux-gnu-gcc (Debian 9.2.1-8) 9.2.1 20190909 Also on mips64: In file included from /usr/mips64-linux-gnuabi64/include/stdio.h:867, from /git/linux/tools/perf/lib/include/perf/cpumap.h:6, from util/session.c:13: In function 'printf', inlined from 'regs_dump__printf' at util/session.c:1103:3, inlined from 'regs__printf' at util/session.c:1131:2, inlined from 'regs_user__printf' at util/session.c:1139:3, inlined from 'dump_sample' at util/session.c:1246:3, inlined from 'machines__deliver_event' at util/session.c:1421:3: /usr/mips64-linux-gnuabi64/include/bits/stdio2.h:107:10: error: '%-5s' directive argument is null [-Werror=format-overflow=] 107 \| return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ()); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function 'printf', inlined from 'regs_dump__printf' at util/session.c:1103:3, inlined from 'regs__printf' at util/session.c:1131:2, inlined from 'regs_intr__printf' at util/session.c:1147:3, inlined from 'dump_sample' at util/session.c:1249:3, inlined from 'machines__deliver_event' at util/session.c:1421:3: /usr/mips64-linux-gnuabi64/include/bits/stdio2.h:107:10: error: '%-5s' directive argument is null [-Werror=format-overflow=] 107 \| return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ()); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cross compiler details: mips64-linux-gnuabi64-gcc (Debian 9.2.1-8) 9.2.1 20190909 Fixes: `2bcd355b71` ("perf tools: Add interface to arch registers sets") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-95wjyv4o65nuaeweq31t7l1s@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-28 08:08:38 -03:00
Arnaldo Carvalho de Melo	2b1ac6403f	perf diff: Use llabs() with 64-bit values To fix this build error on a debian mipsel cross build environment: builtin-diff.c: In function 'compute_cycles_diff': builtin-diff.c:649:10: error: absolute value function 'labs' given an argument of type 's64' {aka 'long long int'} but has parameter of type 'long int' which may cause truncation of value [-Werror=absolute-value] 649 \| val = labs(pair->block_info->cycles_spark[i] - \| ^~~~ Fixes: `cebf7d51a6` ("perf diff: Report noisy for cycles diff") Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-pn7szy5uw384ntjgk6zckh6a@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-28 08:08:38 -03:00
Arnaldo Carvalho de Melo	98e9324511	perf diff: Use llabs() with 64-bit values To fix these build errors on a debian mipsel cross build environment: builtin-diff.c: In function 'block_cycles_diff_cmp': builtin-diff.c:550:6: error: absolute value function 'labs' given an argument of type 's64' {aka 'long long int'} but has parameter of type 'long int' which may cause truncation of value [-Werror=absolute-value] 550 \| l = labs(left->diff.cycles); \| ^~~~ builtin-diff.c:551:6: error: absolute value function 'labs' given an argument of type 's64' {aka 'long long int'} but has parameter of type 'long int' which may cause truncation of value [-Werror=absolute-value] 551 \| r = labs(right->diff.cycles); \| ^~~~ Fixes: `99150a1faa` ("perf diff: Use hists to manage basic blocks per symbol") Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-pn7szy5uw384ntjgk6zckh6a@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-28 08:08:37 -03:00
Arnaldo Carvalho de Melo	1fd450f992	libbpf: Fix up generation of bpf_helper_defs.h $ make -C tools/perf build-test does, ends up with these two problems: make[3]: * No rule to make target '/tmp/tmp.zq13cHILGB/perf-5.3.0/include/uapi/linux/bpf.h', needed by 'bpf_helper_defs.h'. Stop. make[3]: * Waiting for unfinished jobs.... make[2]: * [Makefile.perf:757: /tmp/tmp.zq13cHILGB/perf-5.3.0/tools/lib/bpf/libbpf.a] Error 2 make[2]: * Waiting for unfinished jobs.... Because $(srcdir) points to the /tmp/tmp.zq13cHILGB/perf-5.3.0 directory and we need '/tools/ after that variable, and after fixing this then we get to another problem: /bin/sh: /home/acme/git/perf/tools/scripts/bpf_helpers_doc.py: No such file or directory make[3]: * [Makefile:184: bpf_helper_defs.h] Error 127 make[3]: * Deleting file 'bpf_helper_defs.h' LD /tmp/build/perf/libapi-in.o make[2]: * [Makefile.perf:778: /tmp/build/perf/libbpf.a] Error 2 make[2]: * Waiting for unfinished jobs.... Because this requires something outside the tools/ directories that gets collected into perf's detached tarballs, to fix it just add it to tools/perf/MANIFEST, which this patch does, now it works for that case and also for all these other cases. Fixes: `e01a75c159` ("libbpf: Move bpf_{helpers, helper_defs, endian, tracing}.h into libbpf") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-4pnkg2vmdvq5u6eivc887wen@git.kernel.org Link: https://lore.kernel.org/bpf/20191126151045.GB19483@kernel.org	2019-11-27 16:52:31 -08:00
Jiri Olsa	7b65e2034f	perf tools: Allow to link with libbpf dynamicaly Currently we support only static linking with kernel's libbpf (tools/lib/bpf). This patch adds libbpf package detection and support to link perf with it dynamically. The libbpf package status is displayed with: $ make VF=1 Auto-detecting system features: ... ... libbpf: [ on ] It's not checked by default, because it's quite new. Once it's on most distros we can switch it on. For the same reason it's not added to the test-all check. Perf does not need advanced version of libbpf, so we can check just for the base bpf_object__open function. Adding new compile variable to detect libbpf package and link bpf dynamically: $ make LIBBPF_DYNAMIC=1 ... LINK perf $ ldd perf \| grep bpf libbpf.so.0 => /lib64/libbpf.so.0 (0x00007f46818bc000) If libbpf is not installed, build stops with: Makefile.config:486: * Error: No libbpf devel library found,\ please install libbpf-devel. Stop. Committer testing: $ make LIBBPF_DYNAMIC=1 -C tools/perf O=/tmp/build/perf make: Entering directory '/home/acme/git/perf/tools/perf' BUILD: Doing 'make -j8' parallel build Makefile.config:493: * Error: No libbpf devel library found, please install libbpf-devel. Stop. make[1]: * [Makefile.perf:225: sub-make] Error 2 make: * [Makefile:70: all] Error 2 make: Leaving directory '/home/acme/git/perf/tools/perf' $ Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20191126121253.28253-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-26 11:17:45 -03:00
Arnaldo Carvalho de Melo	a5732681e0	perf tests: Rename tests/map_groups.c to tests/maps.c One more step in mergint the maps and map_groups structs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-bw6aagubqxc47m54k2maezfu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-26 11:07:46 -03:00
Arnaldo Carvalho de Melo	6d38267cf9	perf tests: Rename thread-mg-share to thread-maps-share One more step in merging 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-naxsl3g4ou3fyxb8l8e0pn5e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-26 11:07:46 -03:00
Arnaldo Carvalho de Melo	c54d241b35	perf maps: Rename map_groups.h to maps.h One more step in the merge of 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-9ibtn3vua76f934t7woyf26w@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-26 11:07:46 -03:00
Arnaldo Carvalho de Melo	9a29ceee6b	perf maps: Rename 'mg' variables to 'maps' Continuing the merge of 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-z8d14wrw393a0fbvmnk1bqd9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-26 11:07:46 -03:00
Arnaldo Carvalho de Melo	f2eaea09d6	perf map_symbol: Rename ms->mg to ms->maps One more step on the merge of 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-61rra2wg392rhvdgw421wzpt@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-26 11:07:46 -03:00
Arnaldo Carvalho de Melo	694520dfeb	perf addr_location: Rename al->mg to al->maps One more step on the merge of 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-foo95pyyp3bhocbt7yd8qrvq@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-26 11:07:46 -03:00
Arnaldo Carvalho de Melo	fe87797dea	perf thread: Rename thread->mg to thread->maps One more step on the merge of 'struct maps' with 'struct map_groups'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-69vcr8pubpym90skxhmbwhiw@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-26 11:07:46 -03:00
Arnaldo Carvalho de Melo	79b6bb73f8	perf maps: Merge 'struct maps' with 'struct map_groups' And pick the shortest name: 'struct maps'. The split existed because we used to have two groups of maps, one for functions and one for variables, but that only complicated things, sometimes we needed to figure out what was at some address and then had to first try it on the functions group and if that failed, fall back to the variables one. That split is long gone, so for quite a while we had only one struct maps per struct map_groups, simplify things by combining those structs. First patch is the minimum needed to merge both, follow up patches will rename 'thread->mg' to 'thread->maps', etc. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-hom6639ro7020o708trhxh59@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-26 11:07:46 -03:00
Adrian Hunter	9adab03488	x86/insn: perf tools: Add some more instructions to the new instructions test Add to the "x86 instruction decoder - new instructions" test the following instructions: v4fmaddps v4fmaddss v4fnmaddps v4fnmaddss vaesdec vaesdeclast vaesenc vaesenclast vcvtne2ps2bf16 vcvtneps2bf16 vdpbf16ps gf2p8affineinvqb vgf2p8affineinvqb gf2p8affineqb vgf2p8affineqb gf2p8mulb vgf2p8mulb vp2intersectd vp2intersectq vp4dpwssd vp4dpwssds vpclmulqdq vpcompressb vpcompressw vpdpbusd vpdpbusds vpdpwssd vpdpwssds vpexpandb vpexpandw vpopcntb vpopcntd vpopcntq vpopcntw vpshldd vpshldq vpshldvd vpshldvq vpshldvw vpshldw vpshrdd vpshrdq vpshrdvd vpshrdvq vpshrdvw vpshrdw vpshufbitqmb For information about the instructions, refer Intel SDM May 2019 (325462-070US) and Intel Architecture Instruction Set Extensions May 2019 (319433-037). Committer testing: $ perf test x86 61: x86 rdpmc : Ok 64: x86 instruction decoder - new instructions : Ok 66: x86 bp modify : Ok $ Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20191125125044.31879-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-26 11:07:46 -03:00
Arnaldo Carvalho de Melo	a82f15e39a	perf map: Remove unused functions At some point those stopped being used, prune them. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-p2k98mj3ff2uk1z95sbl5r6e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-26 11:07:46 -03:00
Arnaldo Carvalho de Melo	805fcbc4fb	perf map: Remove needless struct forward declarations At some point we may have needed that, not anymore. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-hnao13231bsl7xml5wn8h4iu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-26 11:07:45 -03:00
Arnaldo Carvalho de Melo	40df3897f0	perf map: Ditch leftover map__reloc_vmlinux() prototype In `39b12f7812` ("perf tools: Make it possible to read object code from vmlinux") the actual function was removed, but we forgot to remove the prototype, fix it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-35yy50cgpcx8cjorluwd5j53@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-26 11:07:45 -03:00
Arnaldo Carvalho de Melo	540a63ea30	perf script: Move map__fprintf_srccode() to near its only user No need to have it elsewhere. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-8cw846pudpxo0xdkvi9qnvrh@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-26 11:07:45 -03:00
Ingo Molnar	ceb9e77324	Merge branch 'x86/core' into perf/core, to resolve conflicts and to pick up completed topic tree Conflicts: tools/perf/check-headers.sh Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-25 09:09:27 +01:00
Ian Rogers	4584f084aa	perf parse: Fix potential memory leak when handling tracepoint errors An error may be in place when tracepoint_error is called, use parse_events__handle_error to avoid a memory leak and to capture the first and last error. Error detected by LLVM's libFuzzer using the following event: $ perf stat -e 'msr/event/,f:e' event syntax error: 'msr/event/,f:e' \___ can't access trace events Error: No permissions to read /sys/kernel/debug/tracing/events/f/e Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing/' Initial error: event syntax error: 'msr/event/,f:e' \___ no value assigned for term Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: clang-built-linux@googlegroups.com Link: http://lore.kernel.org/lkml/20191120180925.21787-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-22 10:48:14 -03:00
Colin Ian King	358f98ee8a	perf probe: Fix spelling mistake "addrees" -> "address" There is a spelling mistake in a pr_warning message. Fix it. Signed-off-by: Colin King <colin.king@canonical.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-janitors@vger.kernel.org Link: http://lore.kernel.org/lkml/20191121092623.374896-1-colin.king@canonical.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-22 10:48:14 -03:00
Adrian Hunter	32a1ece4bd	perf intel-bts: Does not support AUX area sampling Add an error message because Intel BTS does not support AUX area sampling. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-16-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-22 10:48:13 -03:00
Adrian Hunter	dbd134322e	perf intel-pt: Add support for decoding AUX area samples Add support for dumping, queuing and decoding AUX area samples. Decoding samples is the same as regular decoding, except in the case where there are no timestamps, in which case buffers are decoded immediately before the sample event. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-15-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-22 10:48:13 -03:00
Adrian Hunter	c4ab2f0f76	perf intel-pt: Add support for recording AUX area samples Set up the default number of mmap pages, default sample size and default psb_period for AUX area sampling. Add documentation also. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-14-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-22 10:48:13 -03:00
Adrian Hunter	a1ac7de690	perf pmu: When using default config, record which bits of config were changed by the user Default config for a PMU is defined before selected events are parsed. That allows the user-entered config to override the default config. However that does not allow for changing the default config based on other options. For example, if the user chooses AUX area sampling mode, in the case of Intel PT, the psb_period needs to be small for sampling, so there is a need to set the default psb_period to 0 (2 KiB) in that case. However that should not override a value set by the user. To allow for that, when using default config, record which bits of config were changed by the user. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-13-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-22 10:48:13 -03:00
Adrian Hunter	ac2f445fc8	perf auxtrace: Add support for queuing AUX area samples Add functions to queue AUX area samples in advance (auxtrace_queue_data()) or individually (auxtrace_queues__add_sample()) or find out what queue a sample belongs on (auxtrace_queues__sample_queue()). auxtrace_queue_data() can also queue snapshot data which keeps snapshots and samples ordered with respect to each other in case support for that is desired. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-12-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-22 10:48:13 -03:00
Adrian Hunter	103ed40e4b	perf session: Add facility to peek at all events AUX area samples are not limited in how far back in time the sample could start. Consequently samples must be queued in advance to allow for time-ordered processing. To achieve that, add perf_session__peek_events() that walks and peeks at all the events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-11-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-22 10:48:13 -03:00
Adrian Hunter	b04b8dd1e4	perf auxtrace: Add support for dumping AUX area samples Add support for dumping AUX area samples i.e. via the perf script/report -D (--dump-raw-trace) option. Committer notes: Add __maybe_unused to the two args for auxtrace__dump_auxtrace_sample() for when we don't HAVE_AUXTRACE_SUPPORT. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-10-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-22 10:48:13 -03:00
Adrian Hunter	ba2675bf15	perf inject: Cut AUX area samples After decoding AUX area samples, the AUX area data is no longer needed (having been replaced by synthesized events) so cut it out. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-9-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-22 10:48:13 -03:00
Adrian Hunter	eb7a52d46c	perf record: Add aux-sample-size config term To allow individual events to be selected for AUX area sampling, add aux-sample-size config term. attr.aux_sample_size is updated by auxtrace_parse_sample_options() so that the existing validation will see the value. Any event that has a non-zero aux_sample_size will cause AUX area sampling to be configured, irrespective of the --aux-sample option. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-8-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-22 10:48:13 -03:00
Adrian Hunter	c0a6de06c4	perf record: Add support for AUX area sampling Add a 'perf record' option '--aux-sample' to request AUX area sampling. AUX area sampling uses an overwriting buffer much like snapshot mode, so adjust the AUX buffer mmapping accordingly. To make it easy to queue samples for decoding, synthesize an ID index. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-22 10:48:13 -03:00
Adrian Hunter	f0bb7ee853	perf auxtrace: Add support for AUX area sample recording Add support for parsing and validating AUX area sample options. At present, the only option is the sample size, but it is also necessary to ensure that events are in a group with an AUX area event as the leader. Committer note: Add missing 'static inline' in front of auxtrace_parse_sample_options() for when we don't HAVE_AUXTRACE_SUPPORT. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-22 10:48:13 -03:00
Adrian Hunter	f306de275b	perf auxtrace: Move perf_evsel__find_pmu() Move perf_evsel__find_pmu() so it can be used without forward declaration. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-22 10:48:13 -03:00
Adrian Hunter	9bca1a4ef5	perf record: Add a function to test for kernel support for AUX area sampling Architectures are expected to know if AUX area sampling is supported by the hardware. Add a function perf_can_aux_sample() which will determine whether the kernel supports it. Committer notes: I reported that this message was taking place on a kernel without the required bits: # perf record --aux-sample -e '{intel_pt//u,branch-misses:u}' Error: The sys_perf_event_open() syscall returned with 7 (Argument list too long) for event (branch-misses:u). /bin/dmesg \| grep -i perf may provide additional information. Adrian sent a patch addressing it, with this explanation: ---- perf_can_aux_sample_size() always returned true because it did not pass the attribute size to sys_perf_event_open, nor correctly check the return value and errno. ---- After applying it I get, later in the series, when --aux-sample is added: # perf record --aux-sample -e '{intel_pt//u,branch-misses:u}' AUX area sampling is not supported by kernel Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-22 10:43:24 -03:00
Adrian Hunter	98dcf14d7f	perf tools: Add kernel AUX area sampling definitions Add kernel AUX area sampling definitions, which brings perf_event.h into line with the kernel version. New sample type PERF_SAMPLE_AUX requests a sample of the AUX area buffer. New perf_event_attr member 'aux_sample_size' specifies the desired size of the sample. Also add support for parsing samples containing AUX area data i.e. PERF_SAMPLE_AUX. Committer notes: I squashed the first two patches in this series to avoid breaking automatic bisection, i.e. after applying only the original first patch in this series we would have: # perf test -v parsing 26: Sample parsing : --- start --- test child forked, pid 17018 sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating test child finished with -1 ---- end ---- Sample parsing: FAILED! # With the two paches combined: # perf test parsing 26: Sample parsing : Ok # Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-21 10:54:20 -03:00
Jin Yao	848a5e507e	perf report: Jump to symbol source view from total cycles view This patch supports jumping from tui total cycles view to symbol source view. For example, perf record -b ./div perf report --total-cycles In total cycles view, we can select one entry and press 'a' or press ENTER key to jump to symbol source view. This patch also sets sort_order to NULL in cmd_report() which will use the default branch sort order. The percent value in new annotate view will be consistent with the percent in annotate view switched from perf report (we observed the original percent gap with previous patches). v2: --- Fix the 'make NO_SLANG=1' error. (set __maybe_unused to annotation_opts in block_hists_tui_browse()). Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191118140849.20714-2-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-19 19:37:04 -03:00
Jin Yao	5cb456af99	perf util: Move block TUI function to ui browsers It would be nice if we could jump to the assembler/source view (like the normal perf report) from total cycles view. This patch moves the block_hists_tui_browse from block-info.c to ui/browsers/hists.c in order to reuse some browser codes (i.e do_annotate) for implementing new annotation view. v2: --- Fix the 'make NO_SLANG=1' error. (Change 'int block_hists_tui_browse()' to 'static inline int block_hists_tui_browse()') Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20191118140849.20714-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-19 19:33:40 -03:00
Alexey Budankov	bb1835a3b8	perf session: Fix decompression of PERF_RECORD_COMPRESSED records Avoid termination of trace loading in case the last record in the decompressed buffer partly resides in the following mmaped PERF_RECORD_COMPRESSED record. In this case NULL value returned by fetch_mmaped_event() means to proceed to the next mmaped record then decompress it and load compressed events. The issue can be reproduced like this: $ perf record -z -- some_long_running_workload $ perf report --stdio -vv decomp (B): 44519 to 163000 decomp (B): 48119 to 174800 decomp (B): 65527 to 131072 fetch_mmaped_event: head=0x1ffe0 event->header_size=0x28, mmap_size=0x20000: fuzzed perf.data? Error: failed to process sample ... Testing: 71: Zstd perf.data compression/decompression : Ok $ tools/perf/perf report -vv --stdio decomp (B): 59593 to 262160 decomp (B): 4438 to 16512 decomp (B): 285 to 880 Looking at the vmlinux_path (8 entries long) Using vmlinux for symbols decomp (B): 57474 to 261248 prefetch_event: head=0x3fc78 event->header_size=0x28, mmap_size=0x3fc80: fuzzed or compressed perf.data? decomp (B): 25 to 32 decomp (B): 52 to 120 ... Fixes: `57fc032ad6` ("perf session: Avoid infinite loop when seeing invalid header.size") Link: https://marc.info/?l=linux-kernel&m=156580812427554&w=2 Co-developed-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/cf782c34-f3f8-2f9f-d6ab-145cee0d5322@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-19 19:31:55 -03:00
Arnaldo Carvalho de Melo	0e3149f86b	perf dso: Move dso_id from 'struct map' to 'struct dso' And take it into account when looking up DSOs when we have the dso_id fields obtained from somewhere, like from PERF_RECORD_MMAP2 records. Instances of struct map pointing to the same DSO pathname but with anything in dso_id different are in fact different DSOs, so better have different 'struct dso' instances to reflect that. At some point we may want to get copies of the contents of the different objects if we want to do correct annotation or other analysis. With this we get 'struct map' 24 bytes leaner: $ pahole -C map ~/bin/perf struct map { union { struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 / struct list_head node; / 0 16 / } __attribute__((__aligned__(8))); / 0 24 / u64 start; / 24 8 / u64 end; / 32 8 / _Bool erange_warned:1; / 40: 0 1 / _Bool priv:1; / 40: 1 1 / / XXX 6 bits hole, try to pack / / XXX 3 bytes hole, try to pack / u32 prot; / 44 4 / u64 pgoff; / 48 8 / u64 reloc; / 56 8 / / --- cacheline 1 boundary (64 bytes) --- / u64 (map_ip)(struct map , u64); / 64 8 / u64 (unmap_ip)(struct map , u64); / 72 8 / struct dso dso; /* 80 8 / refcount_t refcnt; / 88 4 / u32 flags; / 92 4 / / size: 96, cachelines: 2, members: 13 / / sum members: 92, holes: 1, sum holes: 3 / / sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits / / forced alignments: 1 / / last cacheline: 32 bytes */ } __attribute__((__aligned__(8))); $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-g4hxxmraplo7wfjmk384mfsb@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-19 19:12:26 -03:00
Arnaldo Carvalho de Melo	1f74b100c9	perf dsos: Remove unused dsos__find() method Not used anywhere, nuke it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-teqz0eqcw43mnt7i3me44esw@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-19 17:51:34 -03:00
Arnaldo Carvalho de Melo	7b59a82493	perf map: Move comparision of map's dso_id to a separate function We'll use it when doing DSO lookups using dso_ids. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-u2nr1oq03o0i29w2ay9jx03s@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-19 16:30:56 -03:00
Arnaldo Carvalho de Melo	4a7380a52e	perf map: Pass a dso_id to map__new() Instead of the 4 fields, a step in the direction of moving this to struct dso. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-gp5s1xgxacurmih5d1l94ymy@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-19 15:09:26 -03:00
Arnaldo Carvalho de Melo	99459a84d5	perf map: Move maj/min/ino/ino_generation to separate struct And this patch highlights where these fields are being used: in the sort order where it uses it to compare maps and classify samples taking into account not just the DSO, but those DSO id fields. I think these should be used to differentiate DSOs with the same name but different 'struct dso_id' fields, i.e. these fields should move to 'struct dso' and then be used as part of the key when doing lookups for DSOs, in addition to the DSO name. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-8v5isitqy0dup47nnwkpc80f@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-19 15:09:26 -03:00
Ian Rogers	a910e4666d	perf parse: Report initial event parsing error Record the first event parsing error and report. Implementing feedback from Jiri Olsa: https://lkml.org/lkml/2019/10/28/680 An example error is: $ tools/perf/perf stat -e c/c/ WARNING: multiple event parsing errors event syntax error: 'c/c/' \___ unknown term valid terms: event,filter_rem,filter_opc0,edge,filter_isoc,filter_tid,filter_loc,filter_nc,inv,umask,filter_opc1,tid_en,thresh,filter_all_op,filter_not_nm,filter_state,filter_nm,config,config1,config2,name,period,percore Initial error: event syntax error: 'c/c/' \___ Cannot find PMU `c'. Missing kernel support? Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Allison Randal <allison@lohutok.net> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: http://lore.kernel.org/lkml/20191116074652.9960-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-18 19:14:29 -03:00
Masami Hiramatsu	cb40273085	perf probe: Trace a magic number if variable is not found Trace a magic number as immediate value if the target variable is not found at some probe points which is based on one probe event. This feature is good for the case if you trace a source code line with some local variables, which is compiled into several instructions and some of the variables are optimized out on some instructions. Even if so, with this feature, perf probe trace a magic number instead of such disappeared variables and fold those probes on one event. E.g. without this patch: # perf probe -D "pud_page_vaddr pud" Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. Failed to find 'pud' in this function. p:probe/pud_page_vaddr _text+23480787 pud=%ax:x64 p:probe/pud_page_vaddr _text+23808453 pud=%bp:x64 p:probe/pud_page_vaddr _text+23558082 pud=%ax:x64 p:probe/pud_page_vaddr _text+328373 pud=%r8:x64 p:probe/pud_page_vaddr _text+348448 pud=%bx:x64 p:probe/pud_page_vaddr _text+23816818 pud=%bx:x64 With this patch: # perf probe -D "pud_page_vaddr pud" \| head spurious_kernel_fault is blacklisted function, skip it. vmalloc_fault is blacklisted function, skip it. p:probe/pud_page_vaddr _text+23480787 pud=%ax:x64 p:probe/pud_page_vaddr _text+149051 pud=\deade12d:x64 p:probe/pud_page_vaddr _text+23808453 pud=%bp:x64 p:probe/pud_page_vaddr _text+315926 pud=\deade12d:x64 p:probe/pud_page_vaddr _text+23807209 pud=\deade12d:x64 p:probe/pud_page_vaddr _text+23557365 pud=%ax:x64 p:probe/pud_page_vaddr _text+314097 pud=%di:x64 p:probe/pud_page_vaddr _text+314015 pud=\deade12d:x64 p:probe/pud_page_vaddr _text+313893 pud=\deade12d:x64 p:probe/pud_page_vaddr _text+324083 pud=\deade12d:x64 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Link: http://lore.kernel.org/lkml/157406476931.24476.6261475888681844285.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-18 19:09:23 -03:00
Masami Hiramatsu	66f69b2197	perf probe: Support DW_AT_const_value constant value Support DW_AT_const_value for variable assignment instead of location. Note that this requires ftrace supporting immediate value. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Link: http://lore.kernel.org/lkml/157406476012.24476.16096289871757175775.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-18 19:08:02 -03:00
Masami Hiramatsu	72363540c0	perf probe: Support multiprobe event Support multiprobe event if the event is based on function and lines and kernel supports it. In this case, perf probe creates the first probe with an event, and tries to append following probes on that event, since those probes must be on the same source code line. Before this patch; # perf probe -a vfs_read:18 Added new events: probe:vfs_read_L18 (on vfs_read:18) probe:vfs_read_L18_1 (on vfs_read:18) You can now use it in all perf tools, such as: perf record -e probe:vfs_read_L18_1 -aR sleep 1 # After this patch (on multiprobe supported kernel) # perf probe -a vfs_read:18 Added new events: probe:vfs_read_L18 (on vfs_read:18) probe:vfs_read_L18 (on vfs_read:18) You can now use it in all perf tools, such as: perf record -e probe:vfs_read_L18 -aR sleep 1 # Committer testing: On a kernel that doesn't support multiprobe events, after this patch: # uname -a Linux quaco 5.3.8-200.fc30.x86_64 #1 SMP Tue Oct 29 14:46:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # grep append /sys/kernel/debug/tracing/README be modified by appending '.descending' or '.ascending' to a can be modified by appending any of the following modifiers # # perf probe -a vfs_read:18 Added new events: probe:vfs_read_L18 (on vfs_read:18) probe:vfs_read_L18_1 (on vfs_read:18) You can now use it in all perf tools, such as: perf record -e probe:vfs_read_L18_1 -aR sleep 1 # perf probe -l probe:vfs_read_L18 (on vfs_read:18@fs/read_write.c) probe:vfs_read_L18_1 (on vfs_read:18@fs/read_write.c) # Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Link: http://lore.kernel.org/lkml/157406475010.24476.586290752591512351.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-18 19:03:38 -03:00
Masami Hiramatsu	15354d5469	perf probe: Generate event name with line number Generate event name from function name with line number as <function>_L<line_number>. Note that this is only for the new event which is defined by the line number of function (except for line 0). If there is another event on same line, you have to use "-f" option. In that case, the new event has "_1" suffix. e.g. # perf probe -a kernel_read:2 Added new event: probe:kernel_read_L2 (on kernel_read:2) You can now use it in all perf tools, such as: perf record -e probe:kernel_read_L2 -aR sleep 1 But if we omit the line number or 0th line, it will have no suffix. # perf probe -a kernel_read:0 Added new event: probe:kernel_read (on kernel_read) You can now use it in all perf tools, such as: perf record -e probe:kernel_read -aR sleep 1 probe:kernel_read (on kernel_read@linux-5.0.0/fs/read_write.c) probe:kernel_read_L2 (on kernel_read:2@linux-5.0.0/fs/read_write.c) Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Link: http://lore.kernel.org/lkml/157406474026.24476.2828897745502059569.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-18 19:02:00 -03:00
Masami Hiramatsu	499144c83d	perf probe: Do not show non representive lines by perf-probe -L Since perf probe -L shows non representive lines, it can be mislead users where user can put probes. This prevents to show such non representive lines so that user can understand which lines user can probe. # perf probe -L kernel_read <kernel_read@/build/linux-pvZVvI/linux-5.0.0/fs/read_write.c:0> 0 ssize_t kernel_read(struct file file, void buf, size_t count, loff_t pos) { 2 mm_segment_t old_fs; ssize_t result; old_fs = get_fs(); 6 set_fs(get_ds()); / The cast to a user pointer is valid due to the set_fs() / 8 result = vfs_read(file, (void __user )buf, count, pos); 9 set_fs(old_fs); 10 return result; } EXPORT_SYMBOL(kernel_read); Committer testing: Before: # perf probe -L kernel_read <kernel_read@/usr/src/debug/kernel-5.3.fc30/linux-5.3.8-200.fc30.x86_64/fs/read_write.c:0> 0 ssize_t kernel_read(struct file file, void buf, size_t count, loff_t pos) 1 { 2 mm_segment_t old_fs; 3 ssize_t result; 5 old_fs = get_fs(); 6 set_fs(KERNEL_DS); / The cast to a user pointer is valid due to the set_fs() / 8 result = vfs_read(file, (void __user )buf, count, pos); 9 set_fs(old_fs); 10 return result; } EXPORT_SYMBOL(kernel_read); # See the 1, 3, 5 lines? They shouldn't be there, after this patch: # perf probe -L kernel_read <kernel_read@/usr/src/debug/kernel-5.3.fc30/linux-5.3.8-200.fc30.x86_64/fs/read_write.c:0> 0 ssize_t kernel_read(struct file file, void buf, size_t count, loff_t pos) { 2 mm_segment_t old_fs; ssize_t result; old_fs = get_fs(); 6 set_fs(KERNEL_DS); / The cast to a user pointer is valid due to the set_fs() / 8 result = vfs_read(file, (void __user )buf, count, pos); 9 set_fs(old_fs); 10 return result; } EXPORT_SYMBOL(kernel_read); # Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Link: http://lore.kernel.org/lkml/157406473064.24476.2913278267727587314.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-18 18:59:36 -03:00
Masami Hiramatsu	1ae5d88a4e	perf probe: Verify given line is a representive line Verify user given probe line is a representive line (which doesn't share the address with other lines or the line is the least line among the lines which shares same address), and if not, it shows what is the representive line. Without this fix, user can put a probe on the lines which is not a a representive line. But since this is not a representive line, perf probe -l shows a representive line number instead of user given line number. e.g. (put kernel_read:3, but listed as kernel_read:2) # perf probe -a kernel_read:3 Added new event: probe:kernel_read (on kernel_read:3) You can now use it in all perf tools, such as: perf record -e probe:kernel_read -aR sleep 1 # perf probe -l probe:kernel_read (on kernel_read:2@linux-5.0.0/fs/read_write.c) With this fix, perf probe doesn't allow user to put a probe on a representive line, and tell what is the representive line. # perf probe -a kernel_read:3 This line is sharing the addrees with other lines. Please try to probe at kernel_read:2 instead. Error: Failed to add events. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Link: http://lore.kernel.org/lkml/157406472071.24476.14915451439785001021.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-18 18:58:25 -03:00
Masami Hiramatsu	57f95bf5f8	perf probe: Show correct statement line number by perf probe -l The dwarf_getsrc_die() can return the line which is not a statement nor the least line number among the lines which shares same address. This can lead perf probe --list shows incorrect line number for probed address. To fix this, this introduces cu_getsrc_die() which returns only a statement line and which is the least line number (we call it the representive line for an address), and use it in cu_find_lineinfo(). Also, if the given address is the entry address of a real function, cu_find_lineinfo() returns the function declared line number instead of the start line number of the function body. For example, without this change perf probe -l shows incorrect line as below. # perf probe -a kernel_read:2 Added new event: probe:kernel_read (on kernel_read:2) You can now use it in all perf tools, such as: perf record -e probe:kernel_read -aR sleep 1 # perf probe -l probe:kernel_read (on kernel_read:1@linux-5.0.0/fs/read_write.c) With this fix, it shows correct line number as below; # perf probe -l probe:kernel_read (on kernel_read:2@linux-5.0.0/fs/read_write.c) Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Link: http://lore.kernel.org/lkml/157406471067.24476.17463149618465494448.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-18 18:56:27 -03:00
Adrian Hunter	1e5f015442	x86/insn: perf tools: Add some instructions to the new instructions test Add to the "x86 instruction decoder - new instructions" test the following instructions: cldemote tpause umonitor umwait movdiri movdir64b enqcmd enqcmds encls enclu enclv pconfig wbnoinvd For information about the instructions, refer Intel SDM May 2019 (325462-070US) and Intel Architecture Instruction Set Extensions May 2019 (319433-037). Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20191115135447.6519-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-18 18:53:54 -03:00
Arnaldo Carvalho de Melo	7624e69465	perf map: Move seldom used ->flags field to second cacheline So we start with: $ pahole -C map ~/bin/perf struct map { union { struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 / struct list_head node; / 0 16 / } __attribute__((__aligned__(8))); / 0 24 / u64 start; / 24 8 / u64 end; / 32 8 / _Bool erange_warned:1; / 40: 0 1 / _Bool priv:1; / 40: 1 1 / / XXX 6 bits hole, try to pack / / XXX 3 bytes hole, try to pack / u32 prot; / 44 4 / u32 flags; / 48 4 / / XXX 4 bytes hole, try to pack / u64 pgoff; / 56 8 / / --- cacheline 1 boundary (64 bytes) --- / u64 reloc; / 64 8 / u32 maj; / 72 4 / u32 min; / 76 4 / u64 ino; / 80 8 / u64 ino_generation; / 88 8 / u64 (map_ip)(struct map , u64); / 96 8 / u64 (unmap_ip)(struct map , u64); / 104 8 / struct dso dso; /* 112 8 / refcount_t refcnt; / 120 4 / / size: 128, cachelines: 2, members: 17 / / sum members: 116, holes: 2, sum holes: 7 / / sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits / / padding: 4 / / forced alignments: 1 / } __attribute__((__aligned__(8))); $ and 'flags' is seldom used when printing details about the map or with the "cacheline" sort order, we can move them it to the second cacheline, that will allow combining it with 'refcnt', that is only four bytes: $ pahole -C map ~/bin/perf struct map { union { struct rb_node rb_node __attribute__((__aligned__(8))); / 0 24 / struct list_head node; / 0 16 / } __attribute__((__aligned__(8))); / 0 24 / u64 start; / 24 8 / u64 end; / 32 8 / _Bool erange_warned:1; / 40: 0 1 / _Bool priv:1; / 40: 1 1 / / XXX 6 bits hole, try to pack / / XXX 3 bytes hole, try to pack / u32 prot; / 44 4 / u64 pgoff; / 48 8 / u64 reloc; / 56 8 / / --- cacheline 1 boundary (64 bytes) --- / u32 maj; / 64 4 / u32 min; / 68 4 / u64 ino; / 72 8 / u64 ino_generation; / 80 8 / u64 (map_ip)(struct map , u64); / 88 8 / u64 (unmap_ip)(struct map , u64); / 96 8 / struct dso dso; /* 104 8 / refcount_t refcnt; / 112 4 / u32 flags; / 116 4 / / size: 120, cachelines: 2, members: 17 / / sum members: 116, holes: 1, sum holes: 3 / / sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits / / forced alignments: 1 / / last cacheline: 56 bytes */ } __attribute__((__aligned__(8))); $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-2cdw3zlw1mkamaf7nqtdlxfi@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-18 16:51:00 -03:00
Arnaldo Carvalho de Melo	dbc984c961	perf map: Use bitmap for booleans The map->priv and map->erange_warned are seldom used, the first only in tests/vmlinux-kallsyms.c, the later only when hist_entry__inc_addr_samples() returns -ERANGE in 'perf top', which are really rare occasions, so make them a bool bitfield. This will open up space for other members on the first cacheline. $ pahole -C map ~/bin/perf struct map { union { struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 / struct list_head node; / 0 16 / } __attribute__((__aligned__(8))); / 0 24 / u64 start; / 24 8 / u64 end; / 32 8 / _Bool erange_warned:1; / 40: 0 1 / _Bool priv:1; / 40: 1 1 / / XXX 6 bits hole, try to pack / / XXX 3 bytes hole, try to pack / u32 prot; / 44 4 / u32 flags; / 48 4 / / XXX 4 bytes hole, try to pack / u64 pgoff; / 56 8 / / --- cacheline 1 boundary (64 bytes) --- / u64 reloc; / 64 8 / u32 maj; / 72 4 / u32 min; / 76 4 / u64 ino; / 80 8 / u64 ino_generation; / 88 8 / u64 (map_ip)(struct map , u64); / 96 8 / u64 (unmap_ip)(struct map , u64); / 104 8 / struct dso dso; /* 112 8 / refcount_t refcnt; / 120 4 / / size: 128, cachelines: 2, members: 17 / / sum members: 116, holes: 2, sum holes: 7 / / sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits / / padding: 4 / / forced alignments: 1 */ } __attribute__((__aligned__(8))); $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-g5545pcq4ff0wr17tfb1piqt@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-18 16:29:01 -03:00
Adrian Hunter	aceb98261e	perf callchain: Fix segfault in thread__resolve_callchain_sample() Do not dereference 'chain' when it is NULL. $ perf record -e intel_pt//u -e branch-misses:u uname $ perf report --itrace=l --branch-history perf: Segmentation fault Fixes: `e9024d519d` ("perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc}") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191114142538.4097-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-18 13:01:59 -03:00
Arnaldo Carvalho de Melo	a7c2b572e2	perf map_groups: Auto sort maps by name, if needed There are still lots of lookups by name, even if just when loading vmlinux, till that code is studied to figure out if its possible to do away with those map lookup by names, provide a way to sort it using libc's qsort/bsearch. Doing it at the first lookup defers the sorting a bit, and as the code stands now, is never done for user maps, just for the kernel ones. # perf probe -l # perf probe -x ~/bin/perf -L __map_groups__find_by_name <__map_groups__find_by_name@/home/acme/git/perf/tools/perf/util/symbol.c:0> 0 static struct map __map_groups__find_by_name(struct map_groups mg, const char name) 1 { struct map mapp; 4 if (mg->maps_by_name == NULL && 5 map__groups__sort_by_name_from_rbtree(mg)) 6 return NULL; 8 mapp = bsearch(name, mg->maps_by_name, mg->nr_maps, sizeof(mapp), map__strcmp_name); 9 if (mapp) 10 return mapp; 11 return NULL; 12 } struct map map_groups__find_by_name(struct map_groups mg, const char name) { # perf probe -x ~/bin/perf 'found=__map_groups__find_by_name:10 name:string' Added new event: probe_perf:found (on __map_groups__find_by_name:10 in /home/acme/bin/perf with name:string) You can now use it in all perf tools, such as: perf record -e probe_perf:found -aR sleep 1 # # perf probe -x ~/bin/perf -L map_groups__find_by_name <map_groups__find_by_name@/home/acme/git/perf/tools/perf/util/symbol.c:0> 0 struct map map_groups__find_by_name(struct map_groups mg, const char name) 1 { 2 struct maps maps = &mg->maps; struct map map; 5 down_read(&maps->lock); 7 if (mg->last_search_by_name && strcmp(mg->last_search_by_name->dso->short_name, name) == 0) { 8 map = mg->last_search_by_name; 9 goto out_unlock; } / * If we have mg->maps_by_name, then the name isn't in the rbtree, * as mg->maps_by_name mirrors the rbtree when lookups by name are * made. / 16 map = __map_groups__find_by_name(mg, name); 17 if (map \|\| mg->maps_by_name != NULL) 18 goto out_unlock; / Fallback to traversing the rbtree... / 21 maps__for_each_entry(maps, map) 22 if (strcmp(map->dso->short_name, name) == 0) { 23 mg->last_search_by_name = map; 24 goto out_unlock; } 27 map = NULL; out_unlock: 30 up_read(&maps->lock); 31 return map; 32 } int dso__load_vmlinux(struct dso dso, struct map map, const char vmlinux, bool vmlinux_allocated) # perf probe -x ~/bin/perf 'fallback=map_groups__find_by_name:21 name:string' Added new events: probe_perf:fallback (on map_groups__find_by_name:21 in /home/acme/bin/perf with name:string) probe_perf:fallback_1 (on map_groups__find_by_name:21 in /home/acme/bin/perf with name:string) You can now use it in all perf tools, such as: perf record -e probe_perf:fallback_1 -aR sleep 1 # # perf probe -l probe_perf:fallback (on map_groups__find_by_name:21@util/symbol.c in /home/acme/bin/perf with name_string) probe_perf:fallback_1 (on map_groups__find_by_name:21@util/symbol.c in /home/acme/bin/perf with name_string) probe_perf:found (on __map_groups__find_by_name:10@util/symbol.c in /home/acme/bin/perf with name_string) # # perf stat -e probe_perf:* Now run 'perf top' in another term and then, after a while, stop 'perf stat': Furthermore, if we ask for interval printing, we can see that that is done just at the start of the workload: # perf stat -I1000 -e probe_perf:* # time counts unit events 1.000319513 0 probe_perf:found 1.000319513 0 probe_perf:fallback_1 1.000319513 0 probe_perf:fallback 2.001868092 23,251 probe_perf:found 2.001868092 0 probe_perf:fallback_1 2.001868092 0 probe_perf:fallback 3.002901597 0 probe_perf:found 3.002901597 0 probe_perf:fallback_1 3.002901597 0 probe_perf:fallback 4.003358591 0 probe_perf:found 4.003358591 0 probe_perf:fallback_1 4.003358591 0 probe_perf:fallback ^C # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-c5lmbyr14x448rcfii7y6t3k@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-18 13:01:58 -03:00
Arnaldo Carvalho de Melo	a94ab91a54	perf machine: No need to check if kernel module maps pre-exist We'only populating maps for kernel modules either from perf.data file PERF_RECORD_MMAP records or when parsing /proc/modules, so there is no need to first look if we already have those module maps in the list, that would mean the kernel has duplicate entries. So ditch one use of looking up maps by name. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-gnzjg2hhuz6jnrw91m35059y@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-18 13:01:50 -03:00
Arnaldo Carvalho de Melo	6e0a9b3dfa	perf record: No need to process the synthesized MMAP events twice At the end of a 'perf record' session, by default, we'll process all samples and populate the threads, maps, etc so as to find out which of the DSOs got samples, to reduce the size of the build-id table we'll add to the perf.data headers. But we don't need to process the PERF_RECORD_MMAP events synthesized for the kernel modules, as we have those already via perf_session__create_kernel_maps(), so add mmap/mmap2 handlers that first look at event->header.misc to see if the event is for a user map, bailing out if not. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-mofoxvcx2dryppcw3o689jdd@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-18 11:21:32 -03:00
Arnaldo Carvalho de Melo	f068435d9b	perf map: No need to adjust the long name of modules At some point in the past we needed to make sure we would get the long name of modules and not just what we get from /proc/modules, but that need, as described in the cset that introduced the adjustment function: Fixes: `c03d5184f0` ("perf machine: Adjust dso->long_name for offline module") Without using the buildid-cache: # lsmod \| grep trusted # insmod trusted.ko # lsmod \| grep trusted trusted 24576 0 # strace -e open,openat perf probe -m ./trusted.ko key_seal \|& grep trusted openat(AT_FDCWD, "/sys/module/trusted/notes/.note.gnu.build-id", O_RDONLY) = 4 openat(AT_FDCWD, "/sys/module/trusted/notes/.note.gnu.build-id", O_RDONLY) = 7 openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 openat(AT_FDCWD, "/root/.debug/root/trusted.ko/dd3d355d567394d540f527e093e0f64b95879584/probes", O_RDWR\|O_CREAT, 0644) = 3 openat(AT_FDCWD, "/usr/lib/debug/root/trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/lib/debug/root/trusted.ko", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/root/.debug/trusted.ko", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 openat(AT_FDCWD, "trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, ".debug/trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 4 openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 probe:key_seal (on key_seal in trusted) # perf probe -l probe:key_seal (on key_seal in trusted) # No attempt at opening '[trusted]'. Now using the build-id cache: # rmmod trusted # perf buildid-cache --add ./trusted.ko # insmod trusted.ko # strace -e open,openat perf probe -m ./trusted.ko key_seal \|& grep trusted openat(AT_FDCWD, "/sys/module/trusted/notes/.note.gnu.build-id", O_RDONLY) = 4 openat(AT_FDCWD, "/sys/module/trusted/notes/.note.gnu.build-id", O_RDONLY) = 7 openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 openat(AT_FDCWD, "/root/.debug/root/trusted.ko/dd3d355d567394d540f527e093e0f64b95879584/probes", O_RDWR\|O_CREAT, 0644) = 3 openat(AT_FDCWD, "/usr/lib/debug/root/trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/lib/debug/root/trusted.ko", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/root/.debug/trusted.ko", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 openat(AT_FDCWD, "trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, ".debug/trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "trusted.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 4 openat(AT_FDCWD, "/root/trusted.ko", O_RDONLY) = 3 # Again, no attempt at reading '[trusted]'. Finally, adding a probe to that function and then using: [root@quaco ~]# perf trace -e probe_perf:/max-stack=16/ --max-events=2 0.000 perf/13456 probe_perf:dso__adjust_kmod_long_name(__probe_ip: 5492263) dso__adjust_kmod_long_name (/home/acme/bin/perf) machine__process_kernel_mmap_event (/home/acme/bin/perf) machine__process_mmap_event (/home/acme/bin/perf) perf_event__process_mmap (/home/acme/bin/perf) machines__deliver_event (/home/acme/bin/perf) perf_session__deliver_event (/home/acme/bin/perf) perf_session__process_event (/home/acme/bin/perf) process_simple (/home/acme/bin/perf) reader__process_events (/home/acme/bin/perf) __perf_session__process_events (/home/acme/bin/perf) perf_session__process_events (/home/acme/bin/perf) process_buildids (/home/acme/bin/perf) record__finish_output (/home/acme/bin/perf) __cmd_record (/home/acme/bin/perf) cmd_record (/home/acme/bin/perf) run_builtin (/home/acme/bin/perf) 0.055 perf/13456 probe_perf:dso__adjust_kmod_long_name(__probe_ip: 5492263) dso__adjust_kmod_long_name (/home/acme/bin/perf) machine__process_kernel_mmap_event (/home/acme/bin/perf) machine__process_mmap_event (/home/acme/bin/perf) perf_event__process_mmap (/home/acme/bin/perf) machines__deliver_event (/home/acme/bin/perf) perf_session__deliver_event (/home/acme/bin/perf) perf_session__process_event (/home/acme/bin/perf) process_simple (/home/acme/bin/perf) reader__process_events (/home/acme/bin/perf) __perf_session__process_events (/home/acme/bin/perf) perf_session__process_events (/home/acme/bin/perf) process_buildids (/home/acme/bin/perf) record__finish_output (/home/acme/bin/perf) __cmd_record (/home/acme/bin/perf) cmd_record (/home/acme/bin/perf) run_builtin (/home/acme/bin/perf) # This was the only path I could find using the perf tools that reach at this function, then as of november/2019, if we put a probe in the line where the actuall setting of the dso->long_name is done: # perf trace -e probe_perf: ^C[root@quaco ~] # perf stat -e probe_perf:* -I 2000 2.000404265 0 probe_perf:dso__adjust_kmod_long_name 4.001142200 0 probe_perf:dso__adjust_kmod_long_name 6.001704120 0 probe_perf:dso__adjust_kmod_long_name 8.002398316 0 probe_perf:dso__adjust_kmod_long_name 10.002984010 0 probe_perf:dso__adjust_kmod_long_name 12.003597851 0 probe_perf:dso__adjust_kmod_long_name 14.004113303 0 probe_perf:dso__adjust_kmod_long_name 16.004582773 0 probe_perf:dso__adjust_kmod_long_name 18.005176373 0 probe_perf:dso__adjust_kmod_long_name 20.005801605 0 probe_perf:dso__adjust_kmod_long_name 22.006467540 0 probe_perf:dso__adjust_kmod_long_name ^C 23.683261941 0 probe_perf:dso__adjust_kmod_long_name # Its not being used at all. To further test this I used kvm.ko as the offline module, i.e. removed if from the buildid-cache by nuking it completely (rm -rf ~/.debug) and moved it from the normal kernel distro path, removed the modules, stoped the kvm guest, and then installed it manually, etc. # rmmod kvm-intel # rmmod kvm # lsmod \| grep kvm # modprobe kvm-intel modprobe: ERROR: ctx=0x55d3b1722260 path=/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm/kvm.ko.xz error=No such file or directory modprobe: ERROR: ctx=0x55d3b1722260 path=/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm/kvm.ko.xz error=No such file or directory modprobe: ERROR: could not insert 'kvm_intel': Unknown symbol in module, or unknown parameter (see dmesg) # insmod ./kvm.ko # modprobe kvm-intel modprobe: ERROR: ctx=0x562f34026260 path=/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm/kvm.ko.xz error=No such file or directory modprobe: ERROR: ctx=0x562f34026260 path=/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm/kvm.ko.xz error=No such file or directory # lsmod \| grep kvm kvm_intel 299008 0 kvm 765952 1 kvm_intel irqbypass 16384 1 kvm # # perf probe -x ~/bin/perf machine__findnew_module_map:12 mname=m.name:string filename=filename:string 'dso_long_name=map->dso->long_name:string' 'dso_name=map->dso->name:string' # perf probe -l probe_perf:machine__findnew_module_map (on machine__findnew_module_map:12@util/machine.c in /home/acme/bin/perf with mname filename dso_long_name dso_name) # perf record ^C[ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 3.416 MB perf.data (33956 samples) ] # perf trace -e probe_perf:machine* <SNIP> 6.322 perf/23099 probe_perf:machine__findnew_module_map(__probe_ip: 5492493, mname: "[salsa20_generic]", filename: "/lib/modules/5.3.8-200.fc30.x86_64/kernel/crypto/salsa20_generic.ko.xz", dso_long_name: "/lib/modules/5.3.8-200.fc30.x86_64/kernel/crypto/salsa20_generic.ko.xz", dso_name: "[salsa20_generic]") 6.375 perf/23099 probe_perf:machine__findnew_module_map(__probe_ip: 5492493, mname: "[kvm]", filename: "[kvm]", dso_long_name: "[kvm]", dso_name: "[kvm]") <SNIP> The filename doesn't come with the path, no point in trying to set the dso->long_name. [root@quaco ~]# strace -e open,openat perf probe -m ./kvm.ko kvm_apic_local_deliver \|& egrep 'open.*kvm' openat(AT_FDCWD, "/sys/module/kvm_intel/notes/.note.gnu.build-id", O_RDONLY) = 4 openat(AT_FDCWD, "/sys/module/kvm/notes/.note.gnu.build-id", O_RDONLY) = 4 openat(AT_FDCWD, "/lib/modules/5.3.8-200.fc30.x86_64/kernel/arch/x86/kvm", O_RDONLY\|O_NONBLOCK\|O_CLOEXEC\|O_DIRECTORY) = 7 openat(AT_FDCWD, "/sys/module/kvm_intel/notes/.note.gnu.build-id", O_RDONLY) = 8 openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3 openat(AT_FDCWD, "/root/.debug/root/kvm.ko/5955f426cb93f03f30f3e876814be2db80ab0b55/probes", O_RDWR\|O_CREAT, 0644) = 3 openat(AT_FDCWD, "/usr/lib/debug/root/kvm.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/usr/lib/debug/root/kvm.ko", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/root/.debug/kvm.ko", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3 openat(AT_FDCWD, "kvm.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, ".debug/kvm.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "kvm.ko.debug", O_RDONLY) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3 openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3 openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 4 openat(AT_FDCWD, "/root/kvm.ko", O_RDONLY) = 3 [root@quaco ~]# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-jlfew3lyb24d58egrp0o72o2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-18 11:21:32 -03:00
Arnaldo Carvalho de Melo	1ae14516cb	perf map_groups: Add a front end cache for map lookups by name Lets see if it helps: First look at the probeable lines for the function that does lookups by name in a map_groups struct: # perf probe -x ~/bin/perf -L map_groups__find_by_name <map_groups__find_by_name@/home/acme/git/perf/tools/perf/util/symbol.c:0> 0 struct map map_groups__find_by_name(struct map_groups mg, const char name) 1 { 2 struct maps maps = &mg->maps; struct map map; 5 down_read(&maps->lock); 7 if (mg->last_search_by_name && strcmp(mg->last_search_by_name->dso->short_name, name) == 0) { 8 map = mg->last_search_by_name; 9 goto out_unlock; } 12 maps__for_each_entry(maps, map) 13 if (strcmp(map->dso->short_name, name) == 0) { 14 mg->last_search_by_name = map; 15 goto out_unlock; } 18 map = NULL; out_unlock: 21 up_read(&maps->lock); 22 return map; 23 } int dso__load_vmlinux(struct dso dso, struct map map, const char vmlinux, bool vmlinux_allocated) # Now add a probe to the place where we reuse the last search: # perf probe -x ~/bin/perf map_groups__find_by_name:8 Added new event: probe_perf:map_groups__find_by_name (on map_groups__find_by_name:8 in /home/acme/bin/perf) You can now use it in all perf tools, such as: perf record -e probe_perf:map_groups__find_by_name -aR sleep 1 # Now lets do a system wide 'perf stat' counting those events: # perf stat -e probe_perf:* Leave it running and lets do a 'perf top', then, after a while, stop the 'perf stat': # perf stat -e probe_perf:* ^C Performance counter stats for 'system wide': 3,603 probe_perf:map_groups__find_by_name 44.565253139 seconds time elapsed # yeah, good to have. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-tcz37g3nxv3tvxw3q90vga3p@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-18 11:21:32 -03:00
Arnaldo Carvalho de Melo	c5c584d2db	perf maps: Do not use an rbtree to sort by map name This is only used for the kernel maps, shave 24 bytes out 'struct map' and just traverse the existing per ip rbtree to look for maps by name, use a front end cache to reuse the last search if its the same name. After this 'struct map' is down to just two cachelines: $ pahole -C map ~/bin/perf struct map { union { struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 / struct list_head node; / 0 16 / } __attribute__((__aligned__(8))); / 0 24 / u64 start; / 24 8 / u64 end; / 32 8 / _Bool erange_warned; / 40 1 / / XXX 3 bytes hole, try to pack / u32 priv; / 44 4 / u32 prot; / 48 4 / u32 flags; / 52 4 / u64 pgoff; / 56 8 / / --- cacheline 1 boundary (64 bytes) --- / u64 reloc; / 64 8 / u32 maj; / 72 4 / u32 min; / 76 4 / u64 ino; / 80 8 / u64 ino_generation; / 88 8 / u64 (map_ip)(struct map , u64); / 96 8 / u64 (unmap_ip)(struct map , u64); / 104 8 / struct dso dso; /* 112 8 / refcount_t refcnt; / 120 4 / / size: 128, cachelines: 2, members: 17 / / sum members: 121, holes: 1, sum holes: 3 / / padding: 4 / / forced alignments: 1 */ } __attribute__((__aligned__(8))); $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-bvr8fqfgzxtgnhnwt5sssx5g@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-18 11:19:51 -03:00
Arnaldo Carvalho de Melo	bcb8af5c46	perf maps: Purge the entries from maps->names in __maps__purge() No need to iterate via the ->names rbtree, as all the entries there as in maps->entries as well, reuse __maps__purge() for that. Doing it this way we can kill maps__for_each_entry_by_name(), maps__for_each_entry_by_name_safe(), maps__{first,next}_by_name(). Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-ps0nrio8pydyo23rr2s696ue@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-13 16:06:28 -03:00
Adrian Hunter	af833988c0	perf scripts python: exported-sql-viewer.py: Fix use of TRUE with SQLite Prior to version 3.23 SQLite does not support TRUE or FALSE, so always use 1 and 0 for SQLite. Fixes: `26c11206f4` ("perf scripts python: exported-sql-viewer.py: Use new 'has_calls' column") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v5.3+ Link: http://lore.kernel.org/lkml/20191113120206.26957-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-13 09:13:16 -03:00
James Clark	da3ef7f6cd	perf vendor events power9: Fix commas so PMU event files are valid JSON No functional change. Remove extra commas in the power9 JSON files so that the files can be parsed and validated by other utilities such as Python that fail to parse invalid JSON. Before: $ diffstat -l -p1 /wb/1.patch \| while read filename ; do echo $filename ; cat $filename \| json_verify ; done tools/perf/pmu-events/arch/powerpc/power9/cache.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x300 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power9/floating-point.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x141 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power9/frontend.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x250 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power9/marked.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x301 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power9/memory.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x300 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power9/other.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x308 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power9/pipeline.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x4D0 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power9/pmc.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x200 (right here) ------^ JSON is invalid tools/perf/pmu-events/arch/powerpc/power9/translation.json parse error: invalid object key (must be a string) [ {, "EventCode": "0x1E" (right here) ------^ JSON is invalid $ After: $ diffstat -l -p1 /wb/1.patch \| while read filename ; do echo $filename ; cat $filename \| json_verify ; done tools/perf/pmu-events/arch/powerpc/power9/cache.json JSON is valid tools/perf/pmu-events/arch/powerpc/power9/floating-point.json JSON is valid tools/perf/pmu-events/arch/powerpc/power9/frontend.json JSON is valid tools/perf/pmu-events/arch/powerpc/power9/marked.json JSON is valid tools/perf/pmu-events/arch/powerpc/power9/memory.json JSON is valid tools/perf/pmu-events/arch/powerpc/power9/other.json JSON is valid tools/perf/pmu-events/arch/powerpc/power9/pipeline.json JSON is valid tools/perf/pmu-events/arch/powerpc/power9/pmc.json JSON is valid tools/perf/pmu-events/arch/powerpc/power9/translation.json JSON is valid $ Signed-off-by: James Clark <james.clark@arm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kevin Mooney <kevin.mooney@arm.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: nd@arm.com Link: http://lore.kernel.org/lkml/20191112160342.26470-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2019-11-12 15:26:55 -03:00

1 2 3 4 5 ...

11054 Commits