mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-11-25 02:20:52 +07:00
26cec7480e
20654 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Adrian Hunter
|
26cec7480e |
perf test x86: Add CET instructions to the new instructions test
Add to the "x86 instruction decoder - new instructions" test the following instructions: incsspd incsspq rdsspd rdsspq saveprevssp rstorssp wrssd wrssq wrussd wrussq setssbsy clrssbsy endbr32 endbr64 And the "notrack" prefix for indirect calls and jumps. For information about the instructions, refer Intel Control-flow Enforcement Technology Specification May 2019 (334525-003). Committer testing: $ perf test instr 67: x86 instruction decoder - new instructions : Ok $ Then use verbose mode and check one of those new instructions: $ perf test -v instr |& grep saveprevssp Decoded ok: f3 0f 01 ea saveprevssp Decoded ok: f3 0f 01 ea saveprevssp $ Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi v. Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20200204171425.28073-3-yu-cheng.yu@intel.com Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Yu-cheng Yu
|
315a4af8cd |
x86/insn: Add Control-flow Enforcement (CET) instructions to the opcode map
Add the following CET instructions to the opcode map: INCSSP: Increment Shadow Stack pointer (SSP). RDSSP: Read SSP into a GPR. SAVEPREVSSP: Use "previous ssp" token at top of current Shadow Stack (SHSTK) to create a "restore token" on the previous (outgoing) SHSTK. RSTORSSP: Restore from a "restore token" to SSP. WRSS: Write to kernel-mode SHSTK (kernel-mode instruction). WRUSS: Write to user-mode SHSTK (kernel-mode instruction). SETSSBSY: Verify the "supervisor token" pointed by MSR_IA32_PL0_SSP, set the token busy, and set then Shadow Stack pointer(SSP) to the value of MSR_IA32_PL0_SSP. CLRSSBSY: Verify the "supervisor token" and clear its busy bit. ENDBR64/ENDBR32: Mark a valid 64/32 bit control transfer endpoint. Detailed information of CET instructions can be found in Intel Software Developer's Manual. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi v. Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20200204171425.28073-2-yu-cheng.yu@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
He Zhe
|
e4ffd066ff |
perf: Normalize gcc parameter when generating arch errno table
The $(CC) passed to arch_errno_names.sh may include a series of parameters along with gcc itself. To avoid overwriting the following parameters of arch_errno_names.sh and break the build like below, we just pick up the first word of the $(CC). find: unknown predicate `-m64/arch' x86_64-wrs-linux-gcc: warning: '-x c' after last input file has no effect x86_64-wrs-linux-gcc: error: unrecognized command line option '-m64/include/uapi/asm-generic/errno.h' x86_64-wrs-linux-gcc: fatal error: no input files Signed-off-by: He Zhe <zhe.he@windriver.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/1581618066-187262-2-git-send-email-zhe.he@windriver.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
2a3d252dff |
perf parse-events: Add defensive NULL check
Terms may have a NULL config in which case a strcmp will SEGV. This can be reproduced with: perf stat -e '*/event=?,nr/' sleep 1 Add a NULL check to avoid this. This was caught by LLVM's libfuzzer. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: clang-built-linux@googlegroups.com Link: http://lore.kernel.org/lkml/20200325164022.41385-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Tony Jones
|
eadcaa3dfd |
perf callchain: Update docs regarding kernel/user space unwinding
The method of unwinding for kernel space is defined by the kernel config, not by the value of --call-graph. Improve the documentation to reflect this. Signed-off-by: Tony Jones <tonyj@suse.de> Link: http://lore.kernel.org/lkml/20200325164053.10177-1-tonyj@suse.de Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ravi Bangoria
|
0d33b34352 |
perf dso: Fix dso comparison
Perf gets dso details from two different sources. 1st, from builid headers in perf.data and 2nd from MMAP2 samples. Dso from buildid header does not have dso_id detail. And dso from MMAP2 samples does not have buildid information. If detail of the same dso is present at both the places, filename is common. Previously, __dsos__findnew_link_by_longname_id() used to compare only long or short names, but Commit |
||
Christophe JAILLET
|
d74b181a02 |
perf cpumap: Fix snprintf overflow check
'snprintf' returns the number of characters which would be generated for the given input. If the returned value is *greater than* or equal to the buffer size, it means that the output has been truncated. Fix the overflow test accordingly. Fixes: |
||
John Garry
|
956a78356c |
perf test: Test pmu-events aliases
Add creating event aliases to the pmu-events test. So currently we verify that the generated pmu-events.c is as expected for some test events. Now test that we generate aliases as expected for those events during normal operation. For that, we cycle through each HW PMU in the system, and use the test events to create aliases, and verify those against known, expected values. For core PMUs, we should create an alias for every event in test_cpu_events[]. However, for uncore PMUs, they need to be matched by the pmu_event.pmu member, so use test_uncore_events[]; so check the match beforehand with pmu_uncore_alias_match(). A sample run is as follows for my x86 machine: john@linux-3c19:~/linux> tools/perf/perf test -vv 10 10: PMU events : --- start --- ... testing PMU uncore_arb aliases: no events to match testing PMU cstate_pkg aliases: no events to match skipping testing PMU breakpoint testing aliases PMU uncore_cbox_1: matched event unc_cbo_xsnp_response.miss_eviction testing PMU uncore_cbox_1 aliases: pass testing PMU power aliases: no events to match testing aliases PMU cpu: matched event bp_l1_btb_correct testing aliases PMU cpu: matched event bp_l2_btb_correct testing aliases PMU cpu: matched event segment_reg_loads.any testing aliases PMU cpu: matched event dispatch_blocked.any testing aliases PMU cpu: matched event eist_trans testing PMU cpu aliases: pass testing PMU intel_pt aliases: no events to match skipping testing PMU software skipping testing PMU intel_bts testing PMU uncore_imc aliases: no events to match testing aliases PMU uncore_cbox_0: matched event unc_cbo_xsnp_response.miss_eviction testing PMU uncore_cbox_0 aliases: pass testing PMU cstate_core aliases: no events to match skipping testing PMU tracepoint testing PMU msr aliases: no events to match test child finished with 0 Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1584442939-8911-8-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
John Garry
|
5b9a50001b |
perf pmu: Make pmu_uncore_alias_match() public
The perf pmu-events test will want to use pmu_uncore_alias_match(), so make it public. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1584442939-8911-7-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
John Garry
|
d504fae93d |
perf pmu: Add is_pmu_core()
Add a function to decide whether a PMU is a core PMU. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1584442939-8911-6-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
John Garry
|
a6c925fd3a |
perf test: Add pmu-events test
The initial test will verify that the test tables in generated pmu-events.c match against known, expected values. For known events added in pmu-events/arch/test, we need to add an entry in test_cpu_aliases_events[] or test_uncore_events[]. A sample run is as follows for x86: john@linux-3c19:~/linux> tools/perf/perf test -vv 10 10: PMU event aliases : --- start --- test child forked, pid 5316 testing event table bp_l1_btb_correct: pass testing event table bp_l2_btb_correct: pass testing event table segment_reg_loads.any: pass testing event table dispatch_blocked.any: pass testing event table eist_trans: pass testing event table uncore_hisi_ddrc.flux_wcmd: pass testing event table unc_cbo_xsnp_response.miss_eviction: pass test child finished with 0 ---- end ---- PMU event aliases: Ok Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com [ Fixup test_cpu_events[] and test_uncore_events[] sentinels to initialize one of its members to NULL, fixing the build in older compilers ] Link: http://lore.kernel.org/lkml/1584442939-8911-5-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
John Garry
|
e45ad701e7 |
perf pmu: Refactor pmu_add_cpu_aliases()
Create pmu_add_cpu_aliases_map() from pmu_add_cpu_aliases(), so the caller can pass the map; the pmu-events test would use this since there would be no CPUID matching to a mapfile there. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1584442939-8911-4-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
John Garry
|
d844780887 |
perf jevents: Support test events folder
With the goal of supporting pmu-events test case, introduce support for a test events folder. These test events can be used for testing generation of pmu-event tables and alias creation for any arch. When running the pmu-events test case, these test events will be used as the platform-agnostic events, so aliases can be created per-PMU and validated against known expected values. To support the test events, add a "testcpu" entry in pmu_events_map[]. The pmu-events test will be able to lookup the events map for "testcpu", to verify the generated tables against expected values. The resultant generated pmu-events.c will now look like the following: struct pmu_event pme_ampere_emag[] = { { .name = "ldrex_spec", .event = "event=0x6c", .desc = "Exclusive operation spe...", .topic = "intrinsic", .long_desc = "Exclusive operation ...", }, ... }; struct pmu_event pme_test_cpu[] = { { .name = "uncore_hisi_ddrc.flux_wcmd", .event = "event=0x2", .desc = "DDRC write commands. Unit: hisi_sccl,ddrc ", .topic = "uncore", .long_desc = "DDRC write commands", .pmu = "hisi_sccl,ddrc", }, { .name = "unc_cbo_xsnp_response.miss_eviction", .event = "umask=0x81,event=0x22", .desc = "Unit: uncore_cbox A cross-core snoop resulted ...", .topic = "uncore", .long_desc = "A cross-core snoop resulted from L3 ...", .pmu = "uncore_cbox", }, { .name = "eist_trans", .event = "umask=0x0,period=200000,event=0x3a", .desc = "Number of Enhanced Intel SpeedStep(R) ...", .topic = "other", }, { .name = 0, }, }; struct pmu_events_map pmu_events_map[] = { ... { .cpuid = "0x00000000500f0000", .version = "v1", .type = "core", .table = pme_ampere_emag }, ... { .cpuid = "testcpu", .version = "v1", .type = "core", .table = pme_test_cpu, }, { .cpuid = 0, .version = 0, .type = 0, .table = 0, }, }; Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1584442939-8911-3-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
John Garry
|
c52db67a74 |
perf jevents: Add some test events
Add some test PMU events. The events are randomly chosen from x86 and arm64 JSONs. The events include CPU and uncore events. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1584442939-8911-2-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jiri Olsa
|
7cd053d4cf |
perf tools: Unify a bit the build directory output
Removing the extra 'SUBDIR' line from clean and doc build output. Because it's annoying.. ;-) Before: $ make clean ... SUBDIR Documentation CLEAN Documentation After: $ make clean ... CLEAN Documentation Before: $ make doc BUILD: Doing 'make -j8' parallel build SUBDIR Documentation ASCIIDOC perf-stat.html ... After: $ make doc BUILD: Doing 'make -j8' parallel build ASCIIDOC perf-stat.html ... Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200318204522.1200981-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
29f36c1688 |
tools headers uapi: Update linux/in.h copy
To get the changes in:
|
||
Vijay Thakkar
|
b5b8a7cf14 |
perf vendor events amd: Update Zen1 events to V2
This patch updates the PMCs for AMD Zen1 core based processors (Family 17h; Models 0 through 2F) to be in accordance with PMCs as documented in the latest versions of the AMD Processor Programming Reference [1], [2] and [3]. Note that some events, such as FPU pipe assignment are missing in [1], and therefore [3] is included for full coverage of events. PMCs added: fpu_pipe_assignment.dual{0|1|2|3} fpu_pipe_assignment.total{0|1|2|3} ls_mab_alloc.dc_prefetcher ls_mab_alloc.stores ls_mab_alloc.loads bp_dyn_ind_pred bp_de_redirect PMC removed: ex_ret_cond_misp Cumulative counts, fpu_pipe_assignment.total and fpu_pipe_assignment.dual, existed in v1, but did expose port-level counters. ex_ret_cond_misp has been removed as it has been removed from the latest versions of the PPR, and when tested, always seems to sample zero as tested on a Ryzen 3400G system. [1]: Processor Programming Reference (PPR) for AMD Family 17h Models 01h,08h, Revision B2 Processors, 54945 Rev 3.03 - Jun 14, 2019. [2]: Processor Programming Reference (PPR) for AMD Family 17h Model 18h, Revision B1 Processors, 55570-B1 Rev 3.14 - Sep 26, 2019. [3]: OSRR for AMD Family 17h processors, Models 00h-2Fh, 56255 Rev 3.03 - July, 2018 All of the PPRs can be found at: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Vijay Thakkar <vijaythakkar@me.com> Acked-by: Kim Phillips <kim.phillips@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jon Grimm <jon.grimm@amd.com> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: vijay thakkar <vijaythakkar@me.com> Link: http://lore.kernel.org/lkml/20200318190002.307290-4-vijaythakkar@me.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Vijay Thakkar
|
2079f7aa0a |
perf vendor events amd: Add Zen2 events
This patch adds PMU events for AMD Zen2 core based processors, namely, Matisse (model 71h), Castle Peak (model 31h) and Rome (model 2xh), as documented in the AMD Processor Programming Reference for Matisse [1]. The model number regex has been set to detect all the models under family 17 that do not match those of Zen1, as the range is larger for zen2. Zen2 adds some additional counters that are not present in Zen1 and events for them have been added in this patch. Some counters have also been removed for Zen2 thatwere previously present in Zen1 and have been confirmed to always sample zero on zen2. These added/removed counters have been omitted for brevity but can be found here: https://gist.github.com/thakkarV/5b12ca5fd7488eb2c42e451e40bdd5f3 Note that PPR for Zen2 [1] does not include some counters that were documented in the PPR for Zen1 based processors [2]. After having tested these counters, some of them that still work for zen2 systems have been preserved in the events for zen2. The counters that are omitted in [1] but are still measurable and non-zero on zen2 (tested on a Ryzen 3900X system) are the following: PMC 0x000 fpu_pipe_assignment.{total|total0|total1|total2|total3} PMC 0x004 fp_num_mov_elim_scal_op.* PMC 0x046 ls_tablewalker.* PMC 0x062 l2_latency.l2_cycles_waiting_on_fills PMC 0x063 l2_wcb_req.* PMC 0x06D l2_fill_pending.l2_fill_busy PMC 0x080 ic_fw32 PMC 0x081 ic_fw32_miss PMC 0x086 bp_snp_re_sync PMC 0x087 ic_fetch_stall.* PMC 0x08C ic_cache_inval.* PMC 0x099 bp_tlb_rel PMC 0x0C7 ex_ret_brn_resync PMC 0x28A ic_oc_mode_switch.* L3PMC 0x001 l3_request_g1.* L3PMC 0x006 l3_comb_clstr_state.* [1]: Processor Programming Reference (PPR) for AMD Family 17h Model 71h, Revision B0 Processors, 56176 Rev 3.06 - Jul 17, 2019 [2]: Processor Programming Reference (PPR) for AMD Family 17h Models 01h,08h, Revision B2 Processors, 54945 Rev 3.03 - Jun 14, 2019 All of the PPRs can be found at: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Here are the results of running "fpu_pipe_assignment.total" events on my Ryzen 3900X family 17h model 71h system: Before this patch: $> perf list *fpu_pipe_assignment* List of pre-defined events (to be used in -e): After: $> perf list *fpu_pipe_assignment* floating point: fpu_pipe_assignment.total [Total number of fp uOps] fpu_pipe_assignment.total0 [Total number uOps assigned to pipe 0] fpu_pipe_assignment.total1 [Total number uOps assigned to pipe 1] fpu_pipe_assignment.total2 [Total number uOps assigned to pipe 2] fpu_pipe_assignment.total3 [Total number uOps assigned to pipe 3] Metric Groups: $> perf stat -e fpu_pipe_assignment.total sleep 1 Performance counter stats for 'sleep 1': 25,883 fpu_pipe_assignment.total 1.004145868 seconds time elapsed 0.001805000 seconds user 0.000000000 seconds sys Usage tests while running Linpackin the background: $> perf stat -I1000 -e fpu_pipe_assignment.total 1.000266796 79,313,191,516 fpu_pipe_assignment.total 2.000809630 68,091,474,430 fpu_pipe_assignment.total 3.001028115 52,925,023,174 fpu_pipe_assignment.total $> perf record -e fpu_pipe_assignment.total,fpu_pipe_assignment.total0 -a sleep 1 [ perf record: Woken up 9 times to write data ] [ perf record: Captured and wrote 4.031 MB perf.data (64764 samples) ] $> perf report --stdio --no-header | head -30 98.33% xhpl xhpl [.] dgemm_kernel 0.28% xhpl xhpl [.] dtrsm_kernel_LT 0.10% xhpl [kernel.kallsyms] [k] entry_SYSCALL_64 0.08% xhpl xhpl [.] idamax_k 0.07% baloo_file_extr liblmdb.so [.] mdb_mid2l_insert 0.06% xhpl xhpl [.] dgemm_itcopy 0.06% xhpl xhpl [.] dgemm_oncopy 0.06% xhpl [kernel.kallsyms] [k] __schedule 0.06% xhpl [kernel.kallsyms] [k] syscall_trace_enter 0.06% xhpl [kernel.kallsyms] [k] native_sched_clock 0.06% xhpl [kernel.kallsyms] [k] pick_next_task_fair 0.05% xhpl xhpl [.] blas_thread_server.llvm.15009391670273914865 0.04% xhpl [kernel.kallsyms] [k] do_syscall_64 0.04% xhpl [kernel.kallsyms] [k] yield_task_fair 0.04% xhpl libpthread-2.31.so [.] __pthread_mutex_unlock_usercnt 0.03% xhpl [kernel.kallsyms] [k] cpuacct_charge 0.03% xhpl [kernel.kallsyms] [k] syscall_return_via_sysret 0.03% xhpl libc-2.31.so [.] __sched_yield 0.03% xhpl [kernel.kallsyms] [k] __calc_delta $> perf annotate --stdio2 dgemm_kernel | egrep '^ {0,2}[0-9]+' -B2 -A2 sub $0x60,%rsp mov %rbx,(%rsp) 0.00 mov %rbp,0x8(%rsp) mov %r12,0x10(%rsp) 0.00 mov %r13,0x18(%rsp) mov %r14,0x20(%rsp) mov %r15,0x28(%rsp) -- mov %rdi,%r13 mov %rsi,0x28(%rsp) 0.00 mov %rdx,%r12 vmovsd %xmm0,0x30(%rsp) shl $0x3,%r10 mov 0x28(%rsp),%rax 0.00 xor %rdx,%rdx mov $0x18,%rdi div %rdi -- nop a0: mov %r12,%rax 0.00 shl $0x3,%rax mov %r8,%rdi lea (%r8,%rax,8),%r15 -- mov %r12,%rax nop 0.00 c0: vmovups (%rdi),%ymm1 0.09 vmovups 0x20(%rdi),%ymm2 0.02 vmovups (%r15),%ymm3 0.10 vmovups %ymm1,(%rsi) 0.07 vmovups %ymm2,0x20(%rsi) 0.07 vmovups %ymm3,0x40(%rsi) 0.06 add $0x40,%rdi add $0x40,%r15 add $0x60,%rsi 0.00 dec %rax ↑ jne c0 mov %r9,%r15 -- nop 110: lea 0x80(%rsp),%rsi 0.01 add $0x60,%rsi 0.03 mov %r12,%rax 0.00 sar $0x3,%rax cmp $0x2,%rax ↓ jl d26 prefetcht0 0x200(%rdi) 0.01 vmovups -0x60(%rsi),%ymm1 0.02 prefetcht0 0xa0(%rsi) 0.00 vbroadcastsd -0x80(%rdi),%ymm0 0.00 prefetcht0 0xe0(%rsi) 0.03 vmovups -0x40(%rsi),%ymm2 0.00 prefetcht0 0x120(%rsi) vmovups -0x20(%rsi),%ymm3 vmulpd %ymm0,%ymm1,%ymm4 0.01 prefetcht0 0x160(%rsi) vmulpd %ymm0,%ymm2,%ymm8 0.01 vmulpd %ymm0,%ymm3,%ymm12 0.02 prefetcht0 0x1a0(%rsi) 0.01 vbroadcastsd -0x78(%rdi),%ymm0 vmulpd %ymm0,%ymm1,%ymm5 0.01 vmulpd %ymm0,%ymm2,%ymm9 vmulpd %ymm0,%ymm3,%ymm13 0.01 vbroadcastsd -0x70(%rdi),%ymm0 vmulpd %ymm0,%ymm1,%ymm6 0.00 vmulpd %ymm0,%ymm2,%ymm10 0.00 add $0x60,%rsi ... snip ... nop 65e0: vmovddup -0x60(%rsi),%xmm2 0.00 vmovups -0x80(%rdi),%xmm0 vmovups -0x70(%rdi),%xmm1 0.00 vmovddup -0x58(%rsi),%xmm3 vfmadd231pd %xmm0,%xmm2,%xmm4 0.00 vfmadd231pd %xmm1,%xmm2,%xmm5 0.00 vfmadd231pd %xmm0,%xmm3,%xmm6 0.00 vfmadd231pd %xmm1,%xmm3,%xmm7 0.00 add $0x10,%rsi add $0x20,%rdi 0.00 dec %rax ↑ jne 65e0 nop nop 6620: vmovddup 0x30(%rsp),%xmm0 0.00 vmulpd %xmm0,%xmm4,%xmm4 0.00 vmulpd %xmm0,%xmm5,%xmm5 vmulpd %xmm0,%xmm6,%xmm6 vmulpd %xmm0,%xmm7,%xmm7 vaddpd (%r15),%xmm4,%xmm4 vaddpd 0x10(%r15),%xmm5,%xmm5 0.00 vaddpd (%r15,%r10,1),%xmm6,%xmm6 0.00 vaddpd 0x10(%r15,%r10,1),%xmm7,%xmm7 0.00 vmovups %xmm4,(%r15) vmovups %xmm5,0x10(%r15) 0.00 vmovups %xmm6,(%r15,%r10,1) vmovups %xmm7,0x10(%r15,%r10,1) add $0x20,%r15 -- lea (%r8,%rax,8),%r8 69d8: mov 0x20(%rsp),%r14 0.00 test $0x1,%r14 ↓ je 6d84 mov %r9,%r15 -- vbroadcastsd -0x28(%rsi),%ymm3 vfmadd231pd (%rdi),%ymm0,%ymm4 0.00 vfmadd231pd 0x20(%rdi),%ymm1,%ymm5 vfmadd231pd 0x40(%rdi),%ymm2,%ymm6 vfmadd231pd 0x60(%rdi),%ymm3,%ymm7 -- vmulpd %ymm0,%ymm4,%ymm4 vaddpd (%r15),%ymm4,%ymm4 0.00 vmovups %ymm4,(%r15) add $0x20,%r15 dec %r11 -- mov %rbx,%rsp mov (%rsp),%rbx 0.01 mov 0x8(%rsp),%rbp mov 0x10(%rsp),%r12 mov 0x18(%rsp),%r13 Signed-off-by: Vijay Thakkar <vijaythakkar@me.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Kim Phillips <kim.phillips@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jon Grimm <jon.grimm@amd.com> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200318190002.307290-3-vijaythakkar@me.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Vijay Thakkar
|
c5f18e9e94 |
perf vendor events amd: Restrict model detection for zen1 based processors
This patch changes the previous blanket detection of AMD Family 17h processors to be more specific to Zen1 core based products only by replacing model detection regex pattern [[:xdigit:]]+ with ([12][0-9A-F]|[0-9A-F]), restricting to models 0 though 2f only. This change is required to allow for the addition of separate PMU events for Zen2 core based models in the following patches as those belong to family 17h but have different PMCs. Current PMU events directory has also been renamed to "amdzen1" from "amdfam17h" to reflect this specificity. Note that although this change does not break PMU counters for existing zen1 based systems, it does disable the current set of counters for zen2 based systems. Counters for zen2 have been added in the following patches in this patchset. Signed-off-by: Vijay Thakkar <vijaythakkar@me.com> Acked-by: Kim Phillips <kim.phillips@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jon Grimm <jon.grimm@amd.com> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200318190002.307290-2-vijaythakkar@me.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Kajol Jain
|
58fc90fda0 |
perf metricgroup: Fix printing event names of metric group with multiple events incase of overlapping events
Commit
|
||
Jin Yao
|
d13e9e413e |
perf stat: Align the output for interval aggregation mode
There is a slight misalignment in -A -I output. For example: # perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000 # time CPU counts unit events 1.000440863 CPU0 1,068,388 cpu/event=cpu-cycles/ 1.000440863 CPU1 875,954 cpu/event=cpu-cycles/ 1.000440863 CPU2 3,072,538 cpu/event=cpu-cycles/ 1.000440863 CPU3 4,026,870 cpu/event=cpu-cycles/ 1.000440863 CPU4 5,919,630 cpu/event=cpu-cycles/ 1.000440863 CPU5 2,714,260 cpu/event=cpu-cycles/ 1.000440863 CPU6 2,219,240 cpu/event=cpu-cycles/ 1.000440863 CPU7 1,299,232 cpu/event=cpu-cycles/ The value of counts is not aligned with the column "counts" and the event name is not aligned with the column "events". With this patch, the output is, # perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000 # time CPU counts unit events 1.000423009 CPU0 997,421 cpu/event=cpu-cycles/ 1.000423009 CPU1 1,422,042 cpu/event=cpu-cycles/ 1.000423009 CPU2 484,651 cpu/event=cpu-cycles/ 1.000423009 CPU3 525,791 cpu/event=cpu-cycles/ 1.000423009 CPU4 1,370,100 cpu/event=cpu-cycles/ 1.000423009 CPU5 442,072 cpu/event=cpu-cycles/ 1.000423009 CPU6 205,643 cpu/event=cpu-cycles/ 1.000423009 CPU7 1,302,250 cpu/event=cpu-cycles/ Now output is aligned. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200218071614.25736-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
dbddf17474 |
perf report/top TUI: Support hotkeys to let user select any event for sorting
When performing "perf report --group", it shows the event group information together. In previous patch, we have supported a new option "--group-sort-idx" to sort the output by the event at the index n in event group. It would be nice if we can use a hotkey in browser to select a event to sort. For example, # perf report --group Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, ... Overhead Command Shared Object Symbol 92.19% 98.68% 0.00% 93.30% mgen mgen [.] LOOP1 3.12% 0.29% 0.00% 0.16% gsd-color libglib-2.0.so.0.5600.4 [.] 0x0000000000049515 1.56% 0.03% 0.00% 0.04% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494b7 1.56% 0.01% 0.00% 0.00% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494ce 1.56% 0.00% 0.00% 0.00% mgen [kernel.kallsyms] [k] task_tick_fair 0.00% 0.15% 0.00% 0.04% perf [kernel.kallsyms] [k] smp_call_function_single 0.00% 0.13% 0.00% 6.08% swapper [kernel.kallsyms] [k] intel_idle 0.00% 0.03% 0.00% 0.00% gsd-color libglib-2.0.so.0.5600.4 [.] g_main_context_check 0.00% 0.03% 0.00% 0.00% swapper [kernel.kallsyms] [k] apic_timer_interrupt 0.00% 0.03% 0.00% 0.00% swapper [kernel.kallsyms] [k] check_preempt_curr When user press hotkey '3' (event index, starting from 0), it indicates to sort output by the forth event in group. Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, ... Overhead Command Shared Object Symbol 92.19% 98.68% 0.00% 93.30% mgen mgen [.] LOOP1 0.00% 0.13% 0.00% 6.08% swapper [kernel.kallsyms] [k] intel_idle 3.12% 0.29% 0.00% 0.16% gsd-color libglib-2.0.so.0.5600.4 [.] 0x0000000000049515 0.00% 0.00% 0.00% 0.06% swapper [kernel.kallsyms] [k] hrtimer_start_range_ns 1.56% 0.03% 0.00% 0.04% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494b7 0.00% 0.15% 0.00% 0.04% perf [kernel.kallsyms] [k] smp_call_function_single 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] update_curr 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] apic_timer_interrupt 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] native_apic_msr_eoi_write 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] __update_load_avg_se v6: --- Jiri provided a good improvement to eliminate unneeded refresh. This improvement is added to v6. v2: --- 1. Report warning at helpline when index is invalid. 2. Report warning at helpline when it's not group event. 3. Use "case '0' ... '9'" to refine the code 4. Split K_RELOAD implementation to another patch. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200220013616.19916-4-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
5e3b810aac |
perf report: Support a new key to reload the browser
Sometimes we may need to reload the browser to update the output since some options are changed. This patch creates a new key K_RELOAD. Once the __cmd_report() returns K_RELOAD, it would repeat the whole process, such as, read samples from data file, sort the data and display in the browser. v5: --- 1. Fix the 'make NO_SLANG=1' error. Define K_RELOAD in util/hist.h. 2. Skip setup_sorting() in repeat path if last key is K_RELOAD. v4: --- Need to quit in perf_evsel_menu__run if key is K_RELOAD. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200220013616.19916-3-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
429a5f9d89 |
perf report: Allow specifying event to be used as sort key in --group output
When performing "perf report --group", it shows the event group information together. By default, the output is sorted by the first event in group. It would be nice for user to select any event for sorting. This patch introduces a new option "--group-sort-idx" to sort the output by the event at the index n in event group. For example, Before: # perf report --group --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, BR_MISP_RETIRED.ALL_BRANCHES:pp, cpu/event=0xc0,umask=1,cmask=1, # Event count (approx.): 6451235635 # # Overhead Command Shared Object Symbol # ................................ ......... ....................... ................................... # 92.19% 98.68% 0.00% 93.30% mgen mgen [.] LOOP1 3.12% 0.29% 0.00% 0.16% gsd-color libglib-2.0.so.0.5600.4 [.] 0x0000000000049515 1.56% 0.03% 0.00% 0.04% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494b7 1.56% 0.01% 0.00% 0.00% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494ce 1.56% 0.00% 0.00% 0.00% mgen [kernel.kallsyms] [k] task_tick_fair 0.00% 0.15% 0.00% 0.04% perf [kernel.kallsyms] [k] smp_call_function_single 0.00% 0.13% 0.00% 6.08% swapper [kernel.kallsyms] [k] intel_idle 0.00% 0.03% 0.00% 0.00% gsd-color libglib-2.0.so.0.5600.4 [.] g_main_context_check 0.00% 0.03% 0.00% 0.00% swapper [kernel.kallsyms] [k] apic_timer_interrupt ... After: # perf report --group --stdio --group-sort-idx 3 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, BR_MISP_RETIRED.ALL_BRANCHES:pp, cpu/event=0xc0,umask=1,cmask=1, # Event count (approx.): 6451235635 # # Overhead Command Shared Object Symbol # ................................ ......... ....................... ................................... # 92.19% 98.68% 0.00% 93.30% mgen mgen [.] LOOP1 0.00% 0.13% 0.00% 6.08% swapper [kernel.kallsyms] [k] intel_idle 3.12% 0.29% 0.00% 0.16% gsd-color libglib-2.0.so.0.5600.4 [.] 0x0000000000049515 0.00% 0.00% 0.00% 0.06% swapper [kernel.kallsyms] [k] hrtimer_start_range_ns 1.56% 0.03% 0.00% 0.04% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494b7 0.00% 0.15% 0.00% 0.04% perf [kernel.kallsyms] [k] smp_call_function_single 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] update_curr 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] apic_timer_interrupt 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] native_apic_msr_eoi_write 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] __update_load_avg_se 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] scheduler_tick Now the output is sorted by the fourth event in group. v7: --- Rebase to latest perf/core, no other change. v4: --- 1. Update Documentation/perf-report.txt to mention '--group-sort-idx' support multiple groups with different amount of events and it should be used on grouped events. 2. Update __hpp__group_sort_idx(), just return when the idx is out of limit. 3. Return failure on symbol_conf.group_sort_idx && !session->evlist->nr_groups. So now we don't need to use together with --group. v3: --- Refine the code in __hpp__group_sort_idx(). Before: for (i = 1; i < nr_members; i++) { if (i == idx) { ret = field_cmp(fields_a[i], fields_b[i]); if (ret) goto out; } } After: if (idx >= 1 && idx < nr_members) { ret = field_cmp(fields_a[idx], fields_b[idx]); if (ret) goto out; } Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200220013616.19916-2-yao.jin@linux.intel.com [ Renamed pair_fields_alloc() to hist_entry__new_pair() and combined decl + assignment of vars ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
ec0479a63b |
perf report/top TUI: Support hotkey 'a' for annotation of unresolved addresses
In previous patch, we have supported the annotation functionality even without symbols. For this patch, it supports the hotkey 'a' on address in report view. Note that, for branch mode, we only support the annotation for "branch to" address. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200227043939.4403-4-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
7b0a0dcb64 |
perf report: Support interactive annotation of code without symbols
For perf report on stripped binaries it is currently impossible to do annotation. The annotation state is all tied to symbols, but there are either no symbols, or symbols are not covering all the code. We should support the annotation functionality even without symbols. This patch fakes a symbol and the symbol name is the string of address. After that, we just follow current annotation working flow. For example, 1. perf report Overhead Command Shared Object Symbol 20.67% div libc-2.27.so [.] __random_r 17.29% div libc-2.27.so [.] __random 10.59% div div [.] 0x0000000000000628 9.25% div div [.] 0x0000000000000612 6.11% div div [.] 0x0000000000000645 2. Select the line of "10.59% div div [.] 0x0000000000000628" and ENTER. Annotate 0x0000000000000628 Zoom into div thread Zoom into div DSO (use the 'k' hotkey to zoom directly into the kernel) Browse map details Run scripts for samples of symbol [0x0000000000000628] Run scripts for all samples Switch to another data file in PWD Exit 3. Select the "Annotate 0x0000000000000628" and ENTER. Percent│ │ │ │ Disassembly of section .text: │ │ 0000000000000628 <.text+0x68>: │ divsd %xmm4,%xmm0 │ divsd %xmm3,%xmm1 │ movsd (%rsp),%xmm2 │ addsd %xmm1,%xmm0 │ addsd %xmm2,%xmm0 │ movsd %xmm0,(%rsp) Now we can see the dump of object starting from 0x628. v5: --- Remove the hotkey 'a' implementation from this patch. It will be moved to a separate patch. v4: --- 1. Support the hotkey 'a'. When we press 'a' on address, now it supports the annotation. 2. Change the patch title from "Support interactive annotation of code without symbols" to "perf report: Support interactive annotation of code without symbols" v3: --- Keep just the ANNOTATION_DUMMY_LEN, and remove the opts->annotate_dummy_len since it's the "maybe in future we will provide" feature. v2: --- Fix a crash issue when annotating an address in "unknown" object. The steps to reproduce this issue: perf record -e cycles:u ls perf report 75.29% ls ld-2.27.so [.] do_lookup_x 23.64% ls ld-2.27.so [.] __GI___tunables_init 1.04% ls [unknown] [k] 0xffffffff85c01210 0.03% ls ld-2.27.so [.] _start When annotating 0xffffffff85c01210, the crash happens. v2 adds checking for ms->map in add_annotate_opt(). If the object is "unknown", ms->map is NULL. Committer notes: Renamed new_annotate_sym() to symbol__new_unresolved(). Use PRIx64 to fix this issue in some 32-bit arches: ui/browsers/hists.c: In function 'symbol__new_unresolved': ui/browsers/hists.c:2474:38: error: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'u64' {aka 'long long unsigned int'} [-Werror=format=] snprintf(name, sizeof(name), "%-#.*lx", BITS_PER_LONG / 4, addr); ~~~~~~^ ~~~~ %-#.*llx Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200227043939.4403-3-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
443bc639e5 |
perf report: Print al_addr when symbol is not found
For branch mode, if the symbol is not found, it prints the address. For example, 0x0000555eee0365a0 in below output. Overhead Command Source Shared Object Source Symbol Target Symbol 17.55% div libc-2.27.so [.] __random [.] __random 6.11% div div [.] 0x0000555eee0365a0 [.] rand 6.10% div libc-2.27.so [.] rand [.] 0x0000555eee036769 5.80% div libc-2.27.so [.] __random_r [.] __random 5.72% div libc-2.27.so [.] __random [.] __random_r 5.62% div libc-2.27.so [.] __random_r [.] __random_r 5.38% div libc-2.27.so [.] __random [.] rand 4.56% div libc-2.27.so [.] __random [.] __random 4.49% div div [.] 0x0000555eee036779 [.] 0x0000555eee0365ff 4.25% div div [.] 0x0000555eee0365fa [.] 0x0000555eee036760 But it's not very easy to understand what the instructions are in the binary. So this patch uses the al_addr instead. With this patch, the output is Overhead Command Source Shared Object Source Symbol Target Symbol 17.55% div libc-2.27.so [.] __random [.] __random 6.11% div div [.] 0x00000000000005a0 [.] rand 6.10% div libc-2.27.so [.] rand [.] 0x0000000000000769 5.80% div libc-2.27.so [.] __random_r [.] __random 5.72% div libc-2.27.so [.] __random [.] __random_r 5.62% div libc-2.27.so [.] __random_r [.] __random_r 5.38% div libc-2.27.so [.] __random [.] rand 4.56% div libc-2.27.so [.] __random [.] __random 4.49% div div [.] 0x0000000000000779 [.] 0x00000000000005ff 4.25% div div [.] 0x00000000000005fa [.] 0x0000000000000760 Now we can use objdump to dump the object starting from 0x5a0. For example, objdump -d --start-address 0x5a0 div 00000000000005a0 <rand@plt>: 5a0: ff 25 2a 0a 20 00 jmpq *0x200a2a(%rip) # 200fd0 <__cxa_finalize@plt+0x200a20> 5a6: 68 02 00 00 00 pushq $0x2 5ab: e9 c0 ff ff ff jmpq 570 <srand@plt-0x10> ... Committer testing: [root@seventh ~]# perf record -a -b sleep 1 [root@seventh ~]# perf report --header-only | grep cpudesc # cpudesc : Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz [root@seventh ~]# perf evlist -v cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY [root@seventh ~]# Before: [root@seventh ~]# perf report --stdio --dso libsystemd-shared-241.so | head -20 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 2K of event 'cycles' # Event count (approx.): 2240 # # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles # ........ ............... ........................ ...................... ...................... .................. # 0.13% systemd-journal libc-2.29.so [.] cfree@GLIBC_2.2.5 [.] _int_free 1 0.09% systemd libsystemd-shared-241.so [.] 0x00007fe406465c82 [.] 0x00007fe406465d80 1 0.09% systemd libsystemd-shared-241.so [.] 0x00007fe406465ded [.] 0x00007fe406465c30 1 0.09% systemd libsystemd-shared-241.so [.] 0x00007fe406465e4e [.] 0x00007fe406465de0 1 0.09% systemd-journal systemd-journald [.] free@plt [.] cfree@GLIBC_2.2.5 1 0.09% systemd-journal libc-2.29.so [.] _int_free [.] _int_free 18 0.09% systemd-journal libc-2.29.so [.] _int_free [.] _int_free 2 0.04% systemd libsystemd-shared-241.so [.] bus_resolve@plt [.] bus_resolve 204 0.04% systemd libsystemd-shared-241.so [.] getpid_cached@plt [.] getpid_cached 7 [root@seventh ~]# After: [root@seventh ~]# perf report --stdio --dso libsystemd-shared-241.so | head -20 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 2K of event 'cycles' # Event count (approx.): 2240 # # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles # ........ ............... ........................ ...................... ...................... .................. # 0.13% systemd-journal libc-2.29.so [.] cfree@GLIBC_2.2.5 [.] _int_free 1 0.09% systemd libsystemd-shared-241.so [.] 0x00000000000f7c82 [.] 0x00000000000f7d80 1 0.09% systemd libsystemd-shared-241.so [.] 0x00000000000f7ded [.] 0x00000000000f7c30 1 0.09% systemd libsystemd-shared-241.so [.] 0x00000000000f7e4e [.] 0x00000000000f7de0 1 0.09% systemd-journal systemd-journald [.] free@plt [.] cfree@GLIBC_2.2.5 1 0.09% systemd-journal libc-2.29.so [.] _int_free [.] _int_free 18 0.09% systemd-journal libc-2.29.so [.] _int_free [.] _int_free 2 0.04% systemd libsystemd-shared-241.so [.] bus_resolve@plt [.] bus_resolve 204 0.04% systemd libsystemd-shared-241.so [.] getpid_cached@plt [.] getpid_cached 7 [root@seventh ~]# Lets use -v to get full paths and then try objdump on the unresolved address: [root@seventh ~]# perf report -v --stdio --dso libsystemd-shared-241.so |& grep libsystemd-shared-241.so | tail -1 0.04% systemd-journal /usr/lib/systemd/libsystemd-shared-241.so 0x80c1a B [.] 0x0000000000080c1a 0x80a95 B [.] 0x0000000000080a95 61 [root@seventh ~]# [root@seventh ~]# objdump -d --start-address 0x00000000000f7d80 /usr/lib/systemd/libsystemd-shared-241.so | head -20 /usr/lib/systemd/libsystemd-shared-241.so: file format elf64-x86-64 Disassembly of section .text: 00000000000f7d80 <proc_cmdline_parse_given@@SD_SHARED+0x330>: f7d80: 41 39 11 cmp %edx,(%r9) f7d83: 0f 84 ff fe ff ff je f7c88 <proc_cmdline_parse_given@@SD_SHARED+0x238> f7d89: 4c 8d 05 97 09 0c 00 lea 0xc0997(%rip),%r8 # 1b8727 <utf8_skip_data@@SD_SHARED+0x3147> f7d90: b9 49 00 00 00 mov $0x49,%ecx f7d95: 48 8d 15 c9 f5 0b 00 lea 0xbf5c9(%rip),%rdx # 1b7365 <utf8_skip_data@@SD_SHARED+0x1d85> f7d9c: 31 ff xor %edi,%edi f7d9e: 48 8d 35 9b ff 0b 00 lea 0xbff9b(%rip),%rsi # 1b7d40 <utf8_skip_data@@SD_SHARED+0x2760> f7da5: e8 a6 d6 f4 ff callq 45450 <log_assert_failed_realm@plt> f7daa: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) f7db0: 41 56 push %r14 f7db2: 41 55 push %r13 f7db4: 41 54 push %r12 f7db6: 55 push %rbp [root@seventh ~]# If we tried the the reported address before this patch: [root@seventh ~]# objdump -d --start-address 0x00007fe406465d80 /usr/lib/systemd/libsystemd-shared-241.so | head -20 /usr/lib/systemd/libsystemd-shared-241.so: file format elf64-x86-64 [root@seventh ~]# Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200227043939.4403-2-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
7eec00a747 |
perf symbols: Consolidate symbol fixup issue
After copying Arm64's perf archive with object files and perf.data file to x86 laptop, the x86's perf kernel symbol resolution fails. It outputs 'unknown' for all symbols parsing. This issue is root caused by the function elf__needs_adjust_symbols(), x86 perf tool uses one weak version, Arm64 (and powerpc) has rewritten their own version. elf__needs_adjust_symbols() decides if need to parse symbols with the relative offset address; but x86 building uses the weak function which misses to check for the elf type 'ET_DYN', so that it cannot parse symbols in Arm DSOs due to the wrong result from elf__needs_adjust_symbols(). The DSO parsing should not depend on any specific architecture perf building; e.g. x86 perf tool can parse Arm and Arm64 DSOs, vice versa. And confirmed by Naveen N. Rao that powerpc64 kernels are not being built as ET_DYN anymore and change to ET_EXEC. This patch removes the arch specific functions for Arm64 and powerpc and changes elf__needs_adjust_symbols() as a common function. In the common elf__needs_adjust_symbols(), it checks an extra condition 'ET_DYN' for elf header type. With this fixing, the Arm64 DSO can be parsed properly with x86's perf tool. Before: # perf script main 3258 1 branches: 0 [unknown] ([unknown]) => ffff800010c4665c [unknown] ([kernel.kallsyms]) main 3258 1 branches: ffff800010c46670 [unknown] ([kernel.kallsyms]) => ffff800010c4eaec [unknown] ([kernel.kallsyms]) main 3258 1 branches: ffff800010c4eaec [unknown] ([kernel.kallsyms]) => ffff800010c4eb00 [unknown] ([kernel.kallsyms]) main 3258 1 branches: ffff800010c4eb08 [unknown] ([kernel.kallsyms]) => ffff800010c4e780 [unknown] ([kernel.kallsyms]) main 3258 1 branches: ffff800010c4e7a0 [unknown] ([kernel.kallsyms]) => ffff800010c4eeac [unknown] ([kernel.kallsyms]) main 3258 1 branches: ffff800010c4eebc [unknown] ([kernel.kallsyms]) => ffff800010c4ed80 [unknown] ([kernel.kallsyms]) After: # perf script main 3258 1 branches: 0 [unknown] ([unknown]) => ffff800010c4665c coresight_timeout+0x54 ([kernel.kallsyms]) main 3258 1 branches: ffff800010c46670 coresight_timeout+0x68 ([kernel.kallsyms]) => ffff800010c4eaec etm4_enable_hw+0x3cc ([kernel.kallsyms]) main 3258 1 branches: ffff800010c4eaec etm4_enable_hw+0x3cc ([kernel.kallsyms]) => ffff800010c4eb00 etm4_enable_hw+0x3e0 ([kernel.kallsyms]) main 3258 1 branches: ffff800010c4eb08 etm4_enable_hw+0x3e8 ([kernel.kallsyms]) => ffff800010c4e780 etm4_enable_hw+0x60 ([kernel.kallsyms]) main 3258 1 branches: ffff800010c4e7a0 etm4_enable_hw+0x80 ([kernel.kallsyms]) => ffff800010c4eeac etm4_enable+0x2d4 ([kernel.kallsyms]) main 3258 1 branches: ffff800010c4eebc etm4_enable+0x2e4 ([kernel.kallsyms]) => ffff800010c4ed80 etm4_enable+0x1a8 ([kernel.kallsyms]) v3: Changed to check for ET_DYN across all architectures. v2: Fixed Arm64 and powerpc native building. Reported-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Allison Randal <allison@lohutok.net> Cc: Enrico Weigelt <info@metux.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: John Garry <john.garry@huawei.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Link: http://lore.kernel.org/lkml/20200306015759.10084-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
d4953f7ef1 |
perf parse-events: Fix 3 use after frees found with clang ASAN
Reproducible with a clang asan build and then running perf test in particular 'Parse event definition strings'. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: clang-built-linux@googlegroups.com Link: http://lore.kernel.org/lkml/20200314170356.62914-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ingo Molnar
|
d1c9f7d117 |
perf/core improvements and fixes:
perf record: Alexey Budankov: - Fix binding of AIO user space buffers to nodes maps: Dominik b. Czarnota: - Fix off by one in strncpy() size argument. Arnaldo Carvalho de Melo: - Use strstarts() to look for Android libraries. Ian Rogers: - Give synthetic mmap events an inode generation. man pages: Ian Rogers: - Set man page date to last git commit. perf test: Ian Rogers: - Print if shell directory isn't present. perf report: Jin Yao: - Fix no branch type statistics report issue. perf expr: Jiri Olsa: - Fix copy/paste mistake vendor events: Kan Liang: - Support metric constraints. vendor events intel: Kan Liang: - Add NO_NMI_WATCHDOG metric constraint. vendor events s390: Thomas Richter: - Add new deflate counters for IBM z15. ARM cs-etm: Leo Yan: - Last branch improvements. intel-pt: Adrian Hunter: - Update intel-pt.txt file with new location of the documentation. - Add Intel PT man page references. - Rename intel-pt.txt and put it in man page format. perl scripting: Michael Petlan: - Add common_callchain to fix argument order. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXnFBiwAKCRCyPKLppCJ+ J4sOAQDTh5w3GFDOKzFHLqXWOE9mlsXnS7tHdkypuRweBpuQXQEA0Sq125ludwe7 pzZ1MFqZJ85lw0mfDqBV9E1PlgQz8Q8= =1uH9 -----END PGP SIGNATURE----- Merge tag 'perf-core-for-mingo-5.7-20200317' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: perf record: Alexey Budankov: - Fix binding of AIO user space buffers to nodes maps: Dominik b. Czarnota: - Fix off by one in strncpy() size argument. Arnaldo Carvalho de Melo: - Use strstarts() to look for Android libraries. Ian Rogers: - Give synthetic mmap events an inode generation. man pages: Ian Rogers: - Set man page date to last git commit. perf test: Ian Rogers: - Print if shell directory isn't present. perf report: Jin Yao: - Fix no branch type statistics report issue. perf expr: Jiri Olsa: - Fix copy/paste mistake vendor events: Kan Liang: - Support metric constraints. vendor events intel: Kan Liang: - Add NO_NMI_WATCHDOG metric constraint. vendor events s390: Thomas Richter: - Add new deflate counters for IBM z15. ARM cs-etm: Leo Yan: - Last branch improvements. intel-pt: Adrian Hunter: - Update intel-pt.txt file with new location of the documentation. - Add Intel PT man page references. - Rename intel-pt.txt and put it in man page format. perl scripting: Michael Petlan: - Add common_callchain to fix argument order. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Conflicts: tools/perf/util/map.c |
||
Ingo Molnar
|
409e1a3140 |
Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
Ingo Molnar
|
db5d85ce82 |
perf/urgent fixes:
perf probe: Masami Hiramatsu: - Fix deletion of multiple probe events. - Fix userspace libraries handling by not depending on dwfl_module_addrsym(). Event parsing: Ian Rogers: - Fix reading of invalid memory in event parsing. python binding: Ilie Halip: - Fix clang detection when using CC=clang-version. build: Masami Hiramatsu: - Fix O= use with relative paths. Android: Dominik b. Czarnota: - Fix off by one in strncpy() size argument when handling Android libraries. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXmZSRAAKCRCyPKLppCJ+ JwRrAQCrtJnNothjpZuzQTAouu4vA5BRoUOQvrvN1CSK+baaMwEAonaSfWGgYDDp W9Vt51C9iNFY/Oz3vI11s/gefyyrAw4= =bi0c -----END PGP SIGNATURE----- Merge tag 'perf-urgent-for-mingo-5.6-20200309' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: perf probe: Masami Hiramatsu: - Fix deletion of multiple probe events. - Fix userspace libraries handling by not depending on dwfl_module_addrsym(). Event parsing: Ian Rogers: - Fix reading of invalid memory in event parsing. python binding: Ilie Halip: - Fix clang detection when using CC=clang-version. build: Masami Hiramatsu: - Fix O= use with relative paths. Android: Dominik b. Czarnota: - Fix off by one in strncpy() size argument when handling Android libraries. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
Jiri Olsa
|
59a08b4b3b |
perf expr: Fix copy/paste mistake
Copy/paste leftover from recent refactor.
Fixes:
|
||
Jin Yao
|
c3b10649a8 |
perf report: Fix no branch type statistics report issue
Previously we could get the report of branch type statistics. For example: # perf record -j any,save_type ... # t perf report --stdio # # Branch Statistics: # COND_FWD: 40.6% COND_BWD: 4.1% CROSS_4K: 24.7% CROSS_2M: 12.3% COND: 44.7% UNCOND: 0.0% IND: 6.1% CALL: 24.5% RET: 24.7% But now for the recent perf, it can't report the branch type statistics. It's a regression issue caused by commit |
||
Ian Rogers
|
3b7a15b064 |
perf tools: Give synthetic mmap events an inode generation
When mmap2 events are synthesized the ino_generation field isn't being set leading to uninitialized memory being compared. Caught with clang's -fsanitize=memory: ==124733==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x55a96a6a65cc in __dso_id__cmp tools/perf/util/dsos.c:23:6 #1 0x55a96a6a81d5 in dso_id__cmp tools/perf/util/dsos.c:38:9 #2 0x55a96a6a717f in __dso__cmp_long_name tools/perf/util/dsos.c:74:15 #3 0x55a96a6a6c4c in __dsos__findnew_link_by_longname_id tools/perf/util/dsos.c:106:12 #4 0x55a96a6a851e in __dsos__findnew_by_longname_id tools/perf/util/dsos.c:178:9 #5 0x55a96a6a7798 in __dsos__find_id tools/perf/util/dsos.c:191:9 #6 0x55a96a6a7b57 in __dsos__findnew_id tools/perf/util/dsos.c:251:20 #7 0x55a96a6a7a57 in dsos__findnew_id tools/perf/util/dsos.c:259:17 #8 0x55a96a7776ae in machine__findnew_dso_id tools/perf/util/machine.c:2709:9 #9 0x55a96a77dfcf in map__new tools/perf/util/map.c:193:10 #10 0x55a96a77240a in machine__process_mmap2_event tools/perf/util/machine.c:1670:8 #11 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9 #12 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9 #13 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9 #14 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7 #15 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9 #16 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3 #17 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9 #18 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9 #19 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8 #20 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2 #21 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9 #22 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9 #23 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4 #24 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9 #25 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11 #26 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8 #27 0x55a96a284097 in run_argv tools/perf/perf.c:408:2 #28 0x55a96a282223 in main tools/perf/perf.c:538:3 Uninitialized value was stored to memory at #1 0x55a96a6a18f7 in dso__new_id tools/perf/util/dso.c:1230:14 #2 0x55a96a6a78ee in __dsos__addnew_id tools/perf/util/dsos.c:233:20 #3 0x55a96a6a7bcc in __dsos__findnew_id tools/perf/util/dsos.c:252:21 #4 0x55a96a6a7a57 in dsos__findnew_id tools/perf/util/dsos.c:259:17 #5 0x55a96a7776ae in machine__findnew_dso_id tools/perf/util/machine.c:2709:9 #6 0x55a96a77dfcf in map__new tools/perf/util/map.c:193:10 #7 0x55a96a77240a in machine__process_mmap2_event tools/perf/util/machine.c:1670:8 #8 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9 #9 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9 #10 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9 #11 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7 #12 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9 #13 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3 #14 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9 #15 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9 #16 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8 #17 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2 #18 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9 #19 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9 Uninitialized value was stored to memory at #0 0x55a96a7725af in machine__process_mmap2_event tools/perf/util/machine.c:1646:25 #1 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9 #2 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9 #3 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9 #4 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7 #5 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9 #6 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3 #7 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9 #8 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9 #9 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8 #10 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2 #11 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9 #12 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9 #13 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4 #14 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9 #15 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11 #16 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8 #17 0x55a96a284097 in run_argv tools/perf/perf.c:408:2 #18 0x55a96a282223 in main tools/perf/perf.c:538:3 Uninitialized value was created by a heap allocation #0 0x55a96a22f60d in malloc llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:925:3 #1 0x55a96a882948 in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:655:15 #2 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9 #3 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9 #4 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8 #5 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2 #6 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9 #7 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9 #8 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4 #9 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9 #10 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11 #11 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8 #12 0x55a96a284097 in run_argv tools/perf/perf.c:408:2 #13 0x55a96a282223 in main tools/perf/perf.c:538:3 SUMMARY: MemorySanitizer: use-of-uninitialized-value tools/perf/util/dsos.c:23:6 in __dso_id__cmp Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: clang-built-linux@googlegroups.com Link: http://lore.kernel.org/lkml/20200313053129.131264-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Linus Torvalds
|
e99bc917fe |
A pile of perf fixes:
- AMD uncore driver: Replace the open coded sanity check with the core variant, which provides the correct error code and also leaves a hint in dmesg - tools: - Fix the stdio input handling with glibc versions >= 2.28 - Unbreak the futex-wake benchmark which was reduced to 0 test threads due to the conversion to cpumaps - Initialize sigaction structs before invoking sys_sigactio() - Plug the mapfile memory leak in perf jevents - Fix off by one relative directory includes - Fix an undefined string comparison in perf diff -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl5uQuETHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoVeLEAC3lJ8jRzGfETQJFyS4C+vj1r+Jglvq Hi7Zd8hLDAd+F/aO2/DMgHkKLqpq+sj9qjnPv0Mu/eAS2AbOC3Q4Nz1vm0mxfmyB D6+/t3O2t01hyCJ70g8z7HgJclYyLc+JU72F37UcMCBJNHKFUx6ZrgMOPFRwebc6 aUgyObX5YJ7h35Bl0kYLB0z4q1Znvus3YlFxrEOF78Xldx7zjTJOBsXoDdBjcWVP axtvhOnI3aR8E08a+1nbOmE79qSkscneXY7pg0FVDs9/Zq+38BEOVlzDC5aRG3Rm 4fmty+NO3zOe663kNAGTJ/UQu1fIXGn+6rZ+5lH2pdtgkdeZN6zoVNQFVZrCarhC 9Skrgz2dZ7DQe6/VwM7Z20oChh5V9q/207Rr2w/6+hmtQ/mnriWpXODZxPevc8kN KYHj3Lmo63MrSWIp4Qm4U6wMC9LOGZDUojPs0zbd3prhPoRGVlivTbkQ497Rht00 BW8TCFhKhIqQJyE72KPI1zlmb0piihCHmMUi1XtuRi+3LpGFPQGXHBAxVrT9HJuF 1zGr9VeiY8XtHWBdYoD176aOD8wO36mABHkDo2DY7AmkyI8OefGj5EFwtnr+e1aF F1LRYw+IGn4kMn35NZVNiJUisGzVWGIrWGVCGlTdoKgm3hhVyoRuPKCCzV2GVXd+ 3hjvmSY9aFmrMw== =uJcr -----END PGP SIGNATURE----- Merge tag 'perf-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A pile of perf fixes: Kernel side: - AMD uncore driver: Replace the open coded sanity check with the core variant, which provides the correct error code and also leaves a hint in dmesg Tooling: - Fix the stdio input handling with glibc versions >= 2.28 - Unbreak the futex-wake benchmark which was reduced to 0 test threads due to the conversion to cpumaps - Initialize sigaction structs before invoking sys_sigactio() - Plug the mapfile memory leak in perf jevents - Fix off by one relative directory includes - Fix an undefined string comparison in perf diff" * tag 'perf-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag tools: Fix off-by 1 relative directory includes perf jevents: Fix leak of mapfile memory perf bench: Clear struct sigaction before sigaction() syscall perf bench futex-wake: Restore thread count default to online CPU count perf top: Fix stdio interface input handling with glibc 2.28+ perf diff: Fix undefined string comparision spotted by clang's -Wstring-compare perf symbols: Don't try to find a vmlinux file when looking for kernel modules perf bench: Share some global variables to fix build with gcc 10 perf parse-events: Use asprintf() instead of strncpy() to read tracepoint files perf env: Do not return pointers to local variables perf tests bp_account: Make global variable static |
||
Linus Torvalds
|
78511edc2d |
Power management fix for 5.6-rc6
Fix cpupower utility build failures with -fno-common enabled (Mike Gilbert). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl5rxVcSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxRTsP/2ZB8INWOBzJ32uaCErhVwKYD2QKzdvi i8RUhifg4zkHmPkauc8MwxTerNms8azf0xLiygsmNEI7sFVTmcJk4YFtA0+W8MuR xgor76VgEG7NIErzedCrKpXoosspFqMCICofTpdhGsbzRxK1m5W5ibW7cLpFE+Zd UZc9VSGfKF1fD5rmquCiKxzOxpJSYVTwjwArYeLmV7H+ExM+xq/WwARXcWu6QJCf Yy1g8Qh46Ky1zzWP/MzNaYcQjOH1AiVKCG48DBflzkdbGFYCuBA30XsLLyxK6+h8 Sqcc2MM6w1oQFFSsgt4d8dwCV1prkpbUnCVKDkCuTZtpTG1gtkki6NLkWSnTdSTI vWo8XOHgl7LefsBsNTxGlvZDaPhsHeSwFjkc4f7pzCw673CFcWrQ1xNo6PdEh/fs dVtZeoShe6JEPq5MMTHHSCoLzi3IyVWdkWsY0ycZNaOa/v8HRFYs8q/M4XQ04Cpy rpaqR/fOk78TKkcVcB1hhYIW0ZE+2gVUmVadSm+Eyde471fg1KeJ2d+I2Lga+dpH iGoJ/t1sSS/8/gijG+7r4Nfl/8lgPSEj+JhXyM/1uCkw5xl4OKXlfsGFft4k2o6B +ZzZASiDgfnT0VD3trsmInLhIaGvX1MPryy+Z3EnFaT6yjm+xLD7XpHfz6svnYj1 E0CEuNIYRofw =0wdT -----END PGP SIGNATURE----- Merge tag 'pm-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Fix cpupower utility build failures with -fno-common enabled (Mike Gilbert)" * tag 'pm-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpupower: avoid multiple definition with gcc -fno-common |
||
Ian Rogers
|
b2bf666070 |
perf test: Print if shell directory isn't present
If the shell test directory isn't present the exit code will be 255 but with no error messages printed. Add an error message. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200313005602.45236-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Linus Torvalds
|
1b51f69461 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller: "It looks like a decent sized set of fixes, but a lot of these are one liner off-by-one and similar type changes: 1) Fix netlink header pointer to calcular bad attribute offset reported to user. From Pablo Neira Ayuso. 2) Don't double clear PHY interrupts when ->did_interrupt is set, from Heiner Kallweit. 3) Add missing validation of various (devlink, nl802154, fib, etc.) attributes, from Jakub Kicinski. 4) Missing *pos increments in various netfilter seq_next ops, from Vasily Averin. 5) Missing break in of_mdiobus_register() loop, from Dajun Jin. 6) Don't double bump tx_dropped in veth driver, from Jiang Lidong. 7) Work around FMAN erratum A050385, from Madalin Bucur. 8) Make sure ARP header is pulled early enough in bonding driver, from Eric Dumazet. 9) Do a cond_resched() during multicast processing of ipvlan and macvlan, from Mahesh Bandewar. 10) Don't attach cgroups to unrelated sockets when in interrupt context, from Shakeel Butt. 11) Fix tpacket ring state management when encountering unknown GSO types. From Willem de Bruijn. 12) Fix MDIO bus PHY resume by checking mdio_bus_phy_may_suspend() only in the suspend context. From Heiner Kallweit" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits) net: systemport: fix index check to avoid an array out of bounds access tc-testing: add ETS scheduler to tdc build configuration net: phy: fix MDIO bus PM PHY resuming net: hns3: clear port base VLAN when unload PF net: hns3: fix RMW issue for VLAN filter switch net: hns3: fix VF VLAN table entries inconsistent issue net: hns3: fix "tc qdisc del" failed issue taprio: Fix sending packets without dequeueing them net: mvmdio: avoid error message for optional IRQ net: dsa: mv88e6xxx: Add missing mask of ATU occupancy register net: memcg: fix lockdep splat in inet_csk_accept() s390/qeth: implement smarter resizing of the RX buffer pool s390/qeth: refactor buffer pool code s390/qeth: use page pointers to manage RX buffer pool seg6: fix SRv6 L2 tunnels to use IANA-assigned protocol number net: dsa: Don't instantiate phylink for CPU/DSA ports unless needed net/packet: tpacket_rcv: do not increment ring index on drop sxgbe: Fix off by one in samsung driver strncpy size arg net: caif: Add lockdep expression to RCU traversal primitive MAINTAINERS: remove Sathya Perla as Emulex NIC maintainer ... |
||
Davide Caratti
|
9d0e0cd9a5 |
tc-testing: add ETS scheduler to tdc build configuration
add CONFIG_NET_SCH_ETS to 'config', otherwise test suites using this file
to perform a full tdc run will encounter the following warning:
ok 645 e90e - Add ETS qdisc using bands # skipped - "-----> teardown stage" did not complete successfully
Fixes:
|
||
Alexey Budankov
|
44d462acc0 |
perf record: Fix binding of AIO user space buffers to nodes
Correct maxnode parameter value passed to mbind() syscall to be the
amount of node mask bits to analyze plus 1. Dynamically allocate node
mask memory depending on the index of node of cpu being profiled.
Fixes:
|
||
Michael Petlan
|
67439d555f |
perf scripting perl: Add common_callchain to fix argument order
Since common_callchain has been added to the argument array, we need to reflect it in perl-based scripts, because otherwise the following args would be shifted and thus incorrect. E.g. rw-by-pid and calculation of read and written bytes: Before: read counts by pid: pid comm # reads bytes_requested bytes_read ------ -------------------- ----------- ---------- ---------- 19301 dd 4 424510450039736 0 After: read counts by pid: pid comm # reads bytes_requested bytes_read ------ -------------------- ----------- ---------- ---------- 19301 dd 4 9536 4341 Committer testing: To see before after first do: # perf script record rw-by-pid ^C Now you'll have a perf.data file to report on, then do before and after using: # perf script report rw-by-pid Anbd notice the bytes_request/bytes_read, as above. Signed-off-by: Michael Petlan <mpetlan@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Benjamin Salon <bsalon@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> LPU-Reference: 20200311132836.12693-1-mpetlan@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Adrian Hunter
|
ec2eab9deb |
perf intel-pt: Update intel-pt.txt file with new location of the documentation
Make it easy for people looking in intel-pt.txt to find the new file. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200311122034.3697-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Adrian Hunter
|
870d325b15 |
perf intel-pt: Add Intel PT man page references
Add references to Intel PT man page in man pages of associated tools. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200311122034.3697-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Adrian Hunter
|
97256d1a2a |
perf intel-pt: Rename intel-pt.txt and put it in man page format
Make the Intel PT documentation into a man page. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200311122034.3697-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
0c2d041232 |
perf doc: Set man page date to last git commit
Currently the man page dates reflect the date the man pages were built. This patch adjusts the date so that the date is when then man page last had a commit against it. The date is generated using 'git log'. Committer testing: $ git log -1 --pretty="format:%cd" --date=short tools/perf/Documentation/perf-top.txt 2020-01-14 Before: rm -rf /tmp/build/perf mkdir -p /tmp/build/perf make -C tools/perf O=/tmp/build/perf/ install $ date Wed 11 Mar 2020 10:21:19 AM -03 $ man perf-top | tail -1 perf 03/11/2020 PERF-TOP(1) $ After: rm -rf /tmp/build/perf mkdir -p /tmp/build/perf make -C tools/perf O=/tmp/build/perf/ install $ date $ date Wed 11 Mar 2020 10:24:06 AM -03 $ man perf-top | tail -1 perf 2020-01-14 PERF-TOP(1) $ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masanari Iida <standby24x7@gmail.com> Cc: Mukesh Ojha <mojha@codeaurora.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20200311052110.23132-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
bc010dd657 |
perf cs-etm: Fix unsigned variable comparison to zero
The variable 'offset' in function cs_etm__sample() is u64 type, it's not appropriate to check it with 'while (offset > 0)'; this patch changes to 'while (offset)'. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Mike Leach <mike.leach@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight ml <coresight@lists.linaro.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200219021811.20067-6-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
695378b567 |
perf cs-etm: Optimize copying last branches
If an instruction range packet can generate multiple instruction samples, these samples share the same last branches; it's not necessary to copy the same last branches repeatedly for these samples within the same packet. This patch moves out the last branches copying from function cs_etm__synth_instruction_sample(), and execute it prior to generating instruction samples. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Mike Leach <mike.leach@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight ml <coresight@lists.linaro.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200219021811.20067-5-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
c9f5baa136 |
perf cs-etm: Correct synthesizing instruction samples
When 'etm->instructions_sample_period' is less than 'tidq->period_instructions', the function cs_etm__sample() cannot handle this case properly with its logic. Let's see below flow as an example: - If we set itrace option '--itrace=i4', then function cs_etm__sample() has variables with initialized values: tidq->period_instructions = 0 etm->instructions_sample_period = 4 - When the first packet is coming: packet->instr_count = 10; the number of instructions executed in this packet is 10, thus update period_instructions as below: tidq->period_instructions = 0 + 10 = 10 instrs_over = 10 - 4 = 6 offset = 10 - 6 - 1 = 3 tidq->period_instructions = instrs_over = 6 - When the second packet is coming: packet->instr_count = 10; in the second pass, assume 10 instructions in the trace sample again: tidq->period_instructions = 6 + 10 = 16 instrs_over = 16 - 4 = 12 offset = 10 - 12 - 1 = -3 -> the negative value tidq->period_instructions = instrs_over = 12 So after handle these two packets, there have below issues: The first issue is that cs_etm__instr_addr() returns the address within the current trace sample of the instruction related to offset, so the offset is supposed to be always unsigned value. But in fact, function cs_etm__sample() might calculate a negative offset value (in handling the second packet, the offset is -3) and pass to cs_etm__instr_addr() with u64 type with a big positive integer. The second issue is it only synthesizes 2 samples for sample period = 4. In theory, every packet has 10 instructions so the two packets have total 20 instructions, 20 instructions should generate 5 samples (4 x 5 = 20). This is because cs_etm__sample() only calls once cs_etm__synth_instruction_sample() to generate instruction sample per range packet. This patch fixes the logic in function cs_etm__sample(); the basic idea for handling coming packet is: - To synthesize the first instruction sample, it combines the left instructions from the previous packet and the head of the new packet; then generate continuous samples with sample period; - At the tail of the new packet, if it has the rest instructions, these instructions will be left for the sequential sample. Suggested-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Mike Leach <mike.leach@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight ml <coresight@lists.linaro.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200219021811.20067-4-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
f1410028c7 |
perf cs-etm: Continuously record last branch
Every time synthesize instruction sample, the last branch recording will be reset. This is fine if the instruction period is big enough, for example if use the option '--itrace=i100000', the last branch array is reset for every sample with 100000 instructions per period; before generate the next instruction sample, there has the sufficient packets coming to fill the last branch array. On the other hand, if set a very small period, the packets will be significantly reduced between two continuous instruction samples, thus the last branch array is almost empty for new instruction sample by frequently resetting. To allow the last branches to work properly for any instruction periods, this patch avoids to reset the last branch for every instruction sample and only reset it when flush the trace data. The last branches will be reset only for two cases, one is for trace starting, another case is for discontinuous trace; other cases can keep recording last branches for continuous instruction samples. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Mike Leach <mike.leach@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Walker <robert.walker@arm.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight ml <coresight@lists.linaro.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20200219021811.20067-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |