linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-04 22:36:42 +07:00

Author	SHA1	Message	Date
Frederic Weisbecker	97ea1a7fa6	perf tools: Fix thread comm resolution in perf sched This reverts commit `9a92b479b2` ("perf tools: Improve thread comm resolution in perf sched") and fixes the real bug. The bug was elsewhere: We are failing to resolve thread names in perf sched because the table of threads we are building, on top of comm events, has a per process granularity. But perf sched, unlike the other perf tools, needs a per thread granularity as we are profiling every tasks individually. So fix it by building our threads table using the tid instead of the pid as the thread identifier. v2: Revert the previous fix - it is not really needed Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1255028657-11158-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-08 21:10:21 +02:00
Frederic Weisbecker	9a92b479b2	perf tools: Improve thread comm resolution in perf sched When we get sched traces that involve a task that was already created before opening the event, we won't have the comm event for it. So if we can't find the comm event for a given thread, we look at the traces that may contain these informations. Before: ata/1:371 \| 0.000 ms \| 1 \| avg: 3988.693 ms \| max: 3988.693 ms \| kondemand/1:421 \| 0.096 ms \| 3 \| avg: 345.346 ms \| max: 1035.989 ms \| kondemand/0:420 \| 0.025 ms \| 3 \| avg: 421.332 ms \| max: 964.014 ms \| :5124:5124 \| 0.103 ms \| 5 \| avg: 74.082 ms \| max: 277.194 ms \| :6244:6244 \| 0.691 ms \| 9 \| avg: 125.655 ms \| max: 271.306 ms \| firefox:5080 \| 0.924 ms \| 5 \| avg: 53.833 ms \| max: 257.828 ms \| npviewer.bin:6225 \| 21.871 ms \| 53 \| avg: 22.462 ms \| max: 220.835 ms \| :6245:6245 \| 9.631 ms \| 21 \| avg: 41.864 ms \| max: 213.349 ms \| After: ata/1:371 \| 0.000 ms \| 1 \| avg: 3988.693 ms \| max: 3988.693 ms \| kondemand/1:421 \| 0.096 ms \| 3 \| avg: 345.346 ms \| max: 1035.989 ms \| kondemand/0:420 \| 0.025 ms \| 3 \| avg: 421.332 ms \| max: 964.014 ms \| firefox:5124 \| 0.103 ms \| 5 \| avg: 74.082 ms \| max: 277.194 ms \| npviewer.bin:6244 \| 0.691 ms \| 9 \| avg: 125.655 ms \| max: 271.306 ms \| firefox:5080 \| 0.924 ms \| 5 \| avg: 53.833 ms \| max: 257.828 ms \| npviewer.bin:6225 \| 21.871 ms \| 53 \| avg: 22.462 ms \| max: 220.835 ms \| npviewer.bin:6245 \| 9.631 ms \| 21 \| avg: 41.864 ms \| max: 213.349 ms \| Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1255012632-7882-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-08 16:56:33 +02:00
Arnaldo Carvalho de Melo	439d473b47	perf tools: Rewrite and improve support for kernel modules Representing modules as struct map entries, backed by a DSO, etc, using /proc/modules to find where the module is loaded. DSOs now can have a short and long name, so that in verbose mode we can show exactly which .ko or vmlinux image was used. As kernel modules now are a DSO separate from the kernel, we can ask for just the hits for a particular set of kernel modules, just like we can do with shared libraries: [root@doppio linux-2.6-tip]# perf report -n --vmlinux /home/acme/git/build/tip-recvmmsg/vmlinux --modules --dsos \[drm\] \| head -15 84.58% 13266 Xorg [k] drm_clflush_pages 4.02% 630 Xorg [k] trace_kmalloc.clone.0 3.95% 619 Xorg [k] drm_ioctl 2.07% 324 Xorg [k] drm_addbufs 1.68% 263 Xorg [k] drm_gem_close_ioctl 0.77% 120 Xorg [k] drm_setmaster_ioctl 0.70% 110 Xorg [k] drm_lastclose 0.68% 106 Xorg [k] drm_open 0.54% 85 Xorg [k] drm_mm_search_free [root@doppio linux-2.6-tip]# Specifying --dsos /lib/modules/2.6.31-tip/kernel/drivers/gpu/drm/drm.ko would have the same effect. Allowing specifying just 'drm.ko' is left for another patch. Processing kallsyms so that per kernel module struct map are instantiated was also left for another patch. That will allow removing the module name from each of its symbols. struct symbol was reduced by removing the ->module backpointer and moving it (well now the map) to struct symbol_entry in perf top, that is its only user right now. The total linecount went down by ~500 lines. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-02 10:48:42 +02:00
Arnaldo Carvalho de Melo	1b46cddfcc	perf tools: Use rb_tree for maps Threads can have many and kernel modules will be represented as a tree of maps as well. Ah, and for a perf.data with 146607 samples: Before: [root@doppio ~]# perf stat -r 5 perf report > /dev/null Performance counter stats for 'perf report' (5 runs): 699.823680 task-clock-msecs # 0.991 CPUs ( +- 0.454% ) 74 context-switches # 0.000 M/sec ( +- 1.709% ) 2 CPU-migrations # 0.000 M/sec ( +- 17.008% ) 23114 page-faults # 0.033 M/sec ( +- 0.000% ) 1381257019 cycles # 1973.721 M/sec ( +- 0.290% ) 1456894438 instructions # 1.055 IPC ( +- 0.007% ) 18779818 cache-references # 26.835 M/sec ( +- 0.380% ) 641799 cache-misses # 0.917 M/sec ( +- 1.200% ) 0.705972729 seconds time elapsed ( +- 0.501% ) [root@doppio ~]# After Performance counter stats for 'perf report' (5 runs): 691.261451 task-clock-msecs # 0.993 CPUs ( +- 0.307% ) 72 context-switches # 0.000 M/sec ( +- 0.829% ) 6 CPU-migrations # 0.000 M/sec ( +- 18.409% ) 23127 page-faults # 0.033 M/sec ( +- 0.000% ) 1366395876 cycles # 1976.670 M/sec ( +- 0.153% ) 1443136016 instructions # 1.056 IPC ( +- 0.012% ) 17956402 cache-references # 25.976 M/sec ( +- 0.325% ) 661924 cache-misses # 0.958 M/sec ( +- 1.335% ) 0.696127275 seconds time elapsed ( +- 0.377% ) I.e. we see some speedup too. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> LKML-Reference: <20090928174846.GA3361@ghostprotocols.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-30 13:57:56 +02:00
John Kacur	8b40f521cf	perf tools: Protect header files with a consistent style There was a colorful mix of header guards - standardize them. Signed-off-by: John Kacur <jkacur@redhat.com> LKML-Reference: <alpine.LFD.2.00.0909241756530.11383@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-24 21:27:51 +02:00
Ingo Molnar	0ec04e16d0	perf sched: Add 'perf sched map' scheduling event map printout This prints a textual context-switching outline of workload captured via perf sched record. For example, on a 16 CPU box it outputs: N1 O1 . . . S1 . . . B0 . I0 C1 . M1 . 23002.773423 secs N1 O1 . Q0 . S1 . . . B0 . I0 C1 . M1 . 23002.773423 secs N1 O1 . Q0 . S1 . . . B0 . R1 C1 . M1 . 23002.773485 secs N1 O1 . Q0 . S1 . S0 . B0 . R1 C1 . M1 . 23002.773478 secs L0 O1 . Q0 . S1 . S0 . B0 . R1 C1 . M1 . 23002.773523 secs L0 O1 . . . S1 . S0 . B0 . R1 C1 . M1 . 23002.773531 secs L0 O1 . . . S1 . S0 . B0 . R1 C1 T1 M1 . 23002.773547 secs T1 => irqbalance:2089 L0 O1 . . . S1 . S0 . P0 . R1 C1 T1 M1 . 23002.773549 secs N1 O1 . . . S1 . S0 . P0 . R1 C1 T1 M1 . 23002.773566 secs N1 O1 . . . J0 . S0 . P0 . R1 C1 T1 M1 . 23002.773571 secs N1 O1 . . . J0 . S0 B0 P0 . R1 C1 T1 M1 . 23002.773592 secs N1 O1 . . . J0 . U0 B0 P0 . R1 C1 T1 M1 . 23002.773582 secs N1 O1 . . . S1 . U0 B0 P0 . R1 C1 T1 M1 . 23002.773604 secs N1 O1 . . . S1 . U0 B0 . . R1 C1 T1 M1 . 23002.773615 secs N1 O1 . . . S1 . U0 B0 . . K0 C1 T1 M1 . 23002.773631 secs N1 O1 . M0 . S1 . U0 B0 . . K0 C1 T1 M1 . 23002.773624 secs N1 O1 . M0 . S1 . U0 . . . K0 C1 T1 M1 . 23002.773644 secs N1 O1 . M0 . S1 . U0 . . . R1 C1 T1 M1 . 23002.773662 secs N1 O1 . M0 . S1 . . . . . R1 C1 T1 M1 . 23002.773648 secs N1 O1 . . . S1 . . . . . R1 C1 T1 M1 . 23002.773680 secs N1 O1 . . . L0 . . . . . R1 C1 T1 M1 . 23002.773717 secs N0 O1 . . . L0 . . . . . R1 C1 T1 M1 . 23002.773709 secs N1 O1 . . . L0 . . . . . R1 C1 T1 M1 . 23002.773747 secs Columns stand for individual CPUs, from CPU0 to CPU15, and the two-letter shortcuts stand for tasks that are running on a CPU. '' denotes the CPU that had the event. A dot signals an idle CPU. New tasks are assigned new two-letter shortcuts - when they occur first they are printed. In the above example 'T1' stood for irqbalance: T1 => irqbalance:2089 Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-16 16:41:30 +02:00
Ingo Molnar	b5fae128e4	perf sched: Clean up PID sorting logic Use a sort list for thread atoms insertion as well - instead of hardcoded for PID. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-13 10:22:50 +02:00
Frederic Weisbecker	5b447a6a13	perf tools: Librarize idle thread registration Librarize register_idle_thread() used by annotate and report. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1251693921-6579-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-31 10:04:48 +02:00
Frederic Weisbecker	6baa0a5ae0	perf tools: Factorize the thread code in a dedicated file Factorize the thread management code used by perf-annotate and perf-report in dedicated source and header files. v2: pass last_match by address so that it can actually be modified. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <1250245313-6995-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-15 16:10:19 +02:00

9 Commits