mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-12-25 10:23:55 +07:00
5cb52b5e16
Pull perf updates from Ingo Molnar: "Kernel side changes: - Intel Knights Landing support. (Harish Chegondi) - Intel Broadwell-EP uncore PMU support. (Kan Liang) - Core code improvements. (Peter Zijlstra.) - Event filter, LBR and PEBS fixes. (Stephane Eranian) - Enable cycles:pp on Intel Atom. (Stephane Eranian) - Add cycles:ppp support for Skylake. (Andi Kleen) - Various x86 NMI overhead optimizations. (Andi Kleen) - Intel PT enhancements. (Takao Indoh) - AMD cache events fix. (Vince Weaver) Tons of tooling changes: - Show random perf tool tips in the 'perf report' bottom line (Namhyung Kim) - perf report now defaults to --group if the perf.data file has grouped events, try it with: # perf record -e '{cycles,instructions}' -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.093 MB perf.data (1247 samples) ] # perf report # Samples: 1K of event 'anon group { cycles, instructions }' # Event count (approx.): 1955219195 # # Overhead Command Shared Object Symbol 2.86% 0.22% swapper [kernel.kallsyms] [k] intel_idle 1.05% 0.33% firefox libxul.so [.] js::SetObjectElement 1.05% 0.00% kworker/0:3 [kernel.kallsyms] [k] gen6_ring_get_seqno 0.88% 0.17% chrome chrome [.] 0x0000000000ee27ab 0.65% 0.86% firefox libxul.so [.] js::ValueToId<(js::AllowGC)1> 0.64% 0.23% JS Helper libxul.so [.] js::SplayTree<js::jit::LiveRange*, js::jit::LiveRange>::splay 0.62% 1.27% firefox libxul.so [.] js::GetIterator 0.61% 1.74% firefox libxul.so [.] js::NativeSetProperty 0.61% 0.31% firefox libxul.so [.] js::SetPropertyByDefining - Introduce the 'perf stat record/report' workflow: Generate perf.data files from 'perf stat', to tap into the scripting capabilities perf has instead of defining a 'perf stat' specific scripting support to calculate event ratios, etc. Simple example: $ perf stat record -e cycles usleep 1 Performance counter stats for 'usleep 1': 1,134,996 cycles 0.000670644 seconds time elapsed $ perf stat report Performance counter stats for '/home/acme/bin/perf stat record -e cycles usleep 1': 1,134,996 cycles 0.000670644 seconds time elapsed $ It generates PERF_RECORD_ userspace records to store the details: $ perf report -D | grep PERF_RECORD 0xf0 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 27637 0x118 [0x12]: PERF_RECORD_CPU_MAP nr: 1 cpu: 65535 0x12a [0x40]: PERF_RECORD_STAT_CONFIG 0x16a [0x30]: PERF_RECORD_STAT -1 -1 0x19a [0x40]: PERF_RECORD_MMAP -1/0: [0xffffffff81000000(0x1f000000) @ 0xffffffff81000000]: x [kernel.kallsyms]_text 0x1da [0x18]: PERF_RECORD_STAT_ROUND [acme@ssdandy linux]$ An effort was made to make perf.data files generated like this to not generate cryptic messages when processed by older tools. The 'perf script' bits need rebasing, will go up later. - Make command line options always available, even when they depend on some feature being enabled, warning the user about use of such options (Wang Nan) - Support hw breakpoint events (mem:0xAddress) in the default output mode in 'perf script' (Wang Nan) - Fixes and improvements for supporting annotating ARM binaries, support ARM call and jump instructions, more work needed to have arch specific stuff separated into tools/perf/arch/*/annotate/ (Russell King) - Add initial 'perf config' command, for now just with a --list command to the contents of the configuration file in use and a basic man page describing its format, commands for doing edits and detailed documentation are being reviewed and proof-read. (Taeung Song) - Allows BPF scriptlets specify arguments to be fetched using DWARF info, using a prologue generated at compile/build time (He Kuang, Wang Nan) - Allow attaching BPF scriptlets to module symbols (Wang Nan) - Allow attaching BPF scriptlets to userspace code using uprobe (Wang Nan) - BPF programs now can specify 'perf probe' tunables via its section name, separating key=val values using semicolons (Wang Nan) Testing some of these new BPF features: Use case: get callchains when receiving SSL packets, filter then in the kernel, at arbitrary place. # cat ssl.bpf.c #define SEC(NAME) __attribute__((section(NAME), used)) struct pt_regs; SEC("func=__inet_lookup_established hnum") int func(struct pt_regs *ctx, int err, unsigned short port) { return err == 0 && port == 443; } char _license[] SEC("license") = "GPL"; int _version SEC("version") = LINUX_VERSION_CODE; # # perf record -a -g -e ssl.bpf.c ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.787 MB perf.data (3 samples) ] # perf script | head -30 swapper 0 [000] 58783.268118: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb 8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux) 896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux) 8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux) 855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux) 8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux) 8572a8 process_backlog (/lib/modules/4.3.0+/build/vmlinux) 856b11 net_rx_action (/lib/modules/4.3.0+/build/vmlinux) 2a284b __do_softirq (/lib/modules/4.3.0+/build/vmlinux) 2a2ba3 irq_exit (/lib/modules/4.3.0+/build/vmlinux) 96b7a4 do_IRQ (/lib/modules/4.3.0+/build/vmlinux) 969807 ret_from_intr (/lib/modules/4.3.0+/build/vmlinux) 2dede5 cpu_startup_entry (/lib/modules/4.3.0+/build/vmlinux) 95d5bc rest_init (/lib/modules/4.3.0+/build/vmlinux) 1163ffa start_kernel ([kernel.vmlinux].init.text) 11634d7 x86_64_start_reservations ([kernel.vmlinux].init.text) 1163623 x86_64_start_kernel ([kernel.vmlinux].init.text) qemu-system-x86 9178 [003] 58785.792417: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb 8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux) 896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux) 8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux) 855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux) 8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux) 856660 netif_receive_skb_internal (/lib/modules/4.3.0+/build/vmlinux) 8566ec netif_receive_skb_sk (/lib/modules/4.3.0+/build/vmlinux) 430a br_handle_frame_finish ([bridge]) 48bc br_handle_frame ([bridge]) 855f44 __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux) 8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux) # - Use 'perf probe' various options to list functions, see what variables can be collected at any given point, experiment first collecting without a filter, then filter, use it together with 'perf trace', 'perf top', with or without callchains, if it explodes, please tell us! - Introduce a new callchain mode: "folded", that will list per line representations of all callchains for a give histogram entry, facilitating 'perf report' output processing by other tools, such as Brendan Gregg's flamegraph tools (Namhyung Kim) E.g: # perf report | grep -v ^# | head 18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry | ---cpu_startup_entry | |--12.07%--start_secondary | --6.30%--rest_init start_kernel x86_64_start_reservations x86_64_start_kernel # Becomes, in "folded" mode: # perf report -g folded | grep -v ^# | head -5 18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry 12.07% cpu_startup_entry;start_secondary 6.30% cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel 16.90% 0.00% swapper [kernel.kallsyms] [k] call_cpuidle 11.23% call_cpuidle;cpu_startup_entry;start_secondary 5.67% call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel 16.90% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter 11.23% cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary 5.67% cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel 15.12% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter_state # The user can also select one of "count", "period" or "percent" as the first column. ... and lots of infrastructure enhancements, plus fixes and other changes, features I failed to list - see the shortlog and the git log for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (271 commits) perf evlist: Add --trace-fields option to show trace fields perf record: Store data mmaps for dwarf unwind perf libdw: Check for mmaps also in MAP__VARIABLE tree perf unwind: Check for mmaps also in MAP__VARIABLE tree perf unwind: Use find_map function in access_dso_mem perf evlist: Remove perf_evlist__(enable|disable)_event functions perf evlist: Make perf_evlist__open() open evsels with their cpus and threads (like perf record does) perf report: Show random usage tip on the help line perf hists: Export a couple of hist functions perf diff: Use perf_hpp__register_sort_field interface perf tools: Add overhead/overhead_children keys defaults via string perf tools: Remove list entry from struct sort_entry perf tools: Include all tools/lib directory for tags/cscope/TAGS targets perf script: Align event name properly perf tools: Add missing headers in perf's MANIFEST perf tools: Do not show trace command if it's not compiled in perf report: Change default to use event group view perf top: Decay periods in callchains tools lib: Move bitmap.[ch] from tools/perf/ to tools/{lib,include}/ tools lib: Sync tools/lib/find_bit.c with the kernel ... |
||
---|---|---|
.. | ||
crypto | ||
fpu | ||
numachip | ||
trace | ||
uv | ||
xen | ||
a.out-core.h | ||
acenv.h | ||
acpi.h | ||
agp.h | ||
alternative-asm.h | ||
alternative.h | ||
amd_nb.h | ||
apb_timer.h | ||
apic_flat_64.h | ||
apic.h | ||
apicdef.h | ||
apm.h | ||
arch_hweight.h | ||
archrandom.h | ||
asm-offsets.h | ||
asm.h | ||
atomic64_32.h | ||
atomic64_64.h | ||
atomic.h | ||
barrier.h | ||
bios_ebda.h | ||
bitops.h | ||
boot.h | ||
bootparam_utils.h | ||
bug.h | ||
bugs.h | ||
cache.h | ||
cacheflush.h | ||
calgary.h | ||
ce4100.h | ||
checksum_32.h | ||
checksum_64.h | ||
checksum.h | ||
clocksource.h | ||
cmdline.h | ||
cmpxchg_32.h | ||
cmpxchg_64.h | ||
cmpxchg.h | ||
compat.h | ||
cpu_device_id.h | ||
cpu.h | ||
cpufeature.h | ||
cpumask.h | ||
crash.h | ||
current.h | ||
debugreg.h | ||
delay.h | ||
desc_defs.h | ||
desc.h | ||
device.h | ||
disabled-features.h | ||
div64.h | ||
dma-mapping.h | ||
dma.h | ||
dmi.h | ||
dwarf2.h | ||
e820.h | ||
edac.h | ||
efi.h | ||
elf.h | ||
emergency-restart.h | ||
entry_arch.h | ||
espfix.h | ||
exec.h | ||
fb.h | ||
fixmap.h | ||
floppy.h | ||
frame.h | ||
ftrace.h | ||
futex.h | ||
gart.h | ||
genapic.h | ||
geode.h | ||
gpio.h | ||
hardirq.h | ||
highmem.h | ||
hpet.h | ||
hugetlb.h | ||
hw_breakpoint.h | ||
hw_irq.h | ||
hypertransport.h | ||
hypervisor.h | ||
i8259.h | ||
ia32_unistd.h | ||
ia32.h | ||
idle.h | ||
imr.h | ||
inat_types.h | ||
inat.h | ||
init.h | ||
insn.h | ||
inst.h | ||
intel_mid_vrtc.h | ||
intel_pmc_ipc.h | ||
intel_pt.h | ||
intel_scu_ipc.h | ||
intel-mid.h | ||
io_apic.h | ||
io.h | ||
iomap.h | ||
iommu_table.h | ||
iommu.h | ||
iosf_mbi.h | ||
ipi.h | ||
irq_regs.h | ||
irq_remapping.h | ||
irq_vectors.h | ||
irq_work.h | ||
irq.h | ||
irqdomain.h | ||
irqflags.h | ||
ist.h | ||
jump_label.h | ||
kasan.h | ||
kbdleds.h | ||
Kbuild | ||
kdebug.h | ||
kexec-bzimage64.h | ||
kexec.h | ||
kgdb.h | ||
kmap_types.h | ||
kmemcheck.h | ||
kprobes.h | ||
kvm_emulate.h | ||
kvm_guest.h | ||
kvm_host.h | ||
kvm_para.h | ||
lguest_hcall.h | ||
lguest.h | ||
linkage.h | ||
livepatch.h | ||
local64.h | ||
local.h | ||
mach_timer.h | ||
mach_traps.h | ||
math_emu.h | ||
mc146818rtc.h | ||
mce.h | ||
microcode_amd.h | ||
microcode_intel.h | ||
microcode.h | ||
misc.h | ||
mmconfig.h | ||
mmu_context.h | ||
mmu.h | ||
mmx.h | ||
mmzone_32.h | ||
mmzone_64.h | ||
mmzone.h | ||
module.h | ||
mpspec_def.h | ||
mpspec.h | ||
mpx.h | ||
mshyperv.h | ||
msi.h | ||
msidef.h | ||
msr-index.h | ||
msr-trace.h | ||
msr.h | ||
mtrr.h | ||
mutex_32.h | ||
mutex_64.h | ||
mutex.h | ||
mwait.h | ||
nmi.h | ||
nops.h | ||
numa_32.h | ||
numa.h | ||
olpc_ofw.h | ||
olpc.h | ||
page_32_types.h | ||
page_32.h | ||
page_64_types.h | ||
page_64.h | ||
page_types.h | ||
page.h | ||
paravirt_types.h | ||
paravirt.h | ||
parport.h | ||
pat.h | ||
pci_64.h | ||
pci_x86.h | ||
pci-direct.h | ||
pci-functions.h | ||
pci.h | ||
percpu.h | ||
perf_event_p4.h | ||
perf_event.h | ||
pgalloc.h | ||
pgtable_32_types.h | ||
pgtable_32.h | ||
pgtable_64_types.h | ||
pgtable_64.h | ||
pgtable_types.h | ||
pgtable-2level_types.h | ||
pgtable-2level.h | ||
pgtable-3level_types.h | ||
pgtable-3level.h | ||
pgtable.h | ||
platform_sst_audio.h | ||
pm-trace.h | ||
pmc_atom.h | ||
pmem.h | ||
posix_types.h | ||
preempt.h | ||
probe_roms.h | ||
processor-cyrix.h | ||
processor-flags.h | ||
processor.h | ||
prom.h | ||
proto.h | ||
ptrace.h | ||
pvclock-abi.h | ||
pvclock.h | ||
qrwlock.h | ||
qspinlock_paravirt.h | ||
qspinlock.h | ||
realmode.h | ||
reboot_fixups.h | ||
reboot.h | ||
required-features.h | ||
rio.h | ||
rmwcc.h | ||
rtc.h | ||
rwsem.h | ||
seccomp.h | ||
sections.h | ||
segment.h | ||
serial.h | ||
setup_arch.h | ||
setup.h | ||
shmparam.h | ||
sigcontext.h | ||
sigframe.h | ||
sighandling.h | ||
signal.h | ||
simd.h | ||
smap.h | ||
smp.h | ||
sparsemem.h | ||
special_insns.h | ||
spinlock_types.h | ||
spinlock.h | ||
sta2x11.h | ||
stackprotector.h | ||
stacktrace.h | ||
string_32.h | ||
string_64.h | ||
string.h | ||
suspend_32.h | ||
suspend_64.h | ||
suspend.h | ||
svm.h | ||
swiotlb.h | ||
switch_to.h | ||
sync_bitops.h | ||
sys_ia32.h | ||
syscall.h | ||
syscalls.h | ||
sysfb.h | ||
tce.h | ||
thread_info.h | ||
time.h | ||
timer.h | ||
timex.h | ||
tlb.h | ||
tlbflush.h | ||
topology.h | ||
trace_clock.h | ||
traps.h | ||
tsc.h | ||
uaccess_32.h | ||
uaccess_64.h | ||
uaccess.h | ||
unaligned.h | ||
unistd.h | ||
uprobes.h | ||
user32.h | ||
user_32.h | ||
user_64.h | ||
user.h | ||
vdso.h | ||
vga.h | ||
vgtod.h | ||
virtext.h | ||
vm86.h | ||
vmx.h | ||
vsyscall.h | ||
vvar.h | ||
word-at-a-time.h | ||
x2apic.h | ||
x86_init.h | ||
xor_32.h | ||
xor_64.h | ||
xor_avx.h | ||
xor.h |