linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-11-24 06:20:53 +07:00

Author	SHA1	Message	Date
Frederic Weisbecker	10350ec362	perf: Cleanup perf lock broken states Use enum to get a human view of bad_hist indexes and put bad histogram output in its own function. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>	2010-05-09 13:45:26 +02:00
Hitoshi Mitake	26242d859c	perf lock: Add "info" subcommand for dumping misc information This adds the "info" subcommand to perf lock which can be used to dump metadata like threads or addresses of lock instances. "map" was removed because info should do the work for it. This will be useful not only for debugging but also for ordinary analyzing. v2: adding example of usage % sudo ./perf lock info -t \| Thread ID: comm \| 0: swapper \| 1: init \| 18: migration/5 \| 29: events/2 \| 32: events/5 \| 33: events/6 ... % sudo ./perf lock info -m \| Address of instance: name of class \| 0xffff8800b95adae0: &(&sighand->siglock)->rlock \| 0xffff8800bbb41ae0: &(&sighand->siglock)->rlock \| 0xffff8800bf165ae0: &(&sighand->siglock)->rlock \| 0xffff8800b9576a98: &p->cred_guard_mutex \| 0xffff8800bb890a08: &(&p->alloc_lock)->rlock \| 0xffff8800b9522a08: &(&p->alloc_lock)->rlock \| 0xffff8800bb8aaa08: &(&p->alloc_lock)->rlock \| 0xffff8800bba72a08: &(&p->alloc_lock)->rlock \| 0xffff8800bf18ea08: &(&p->alloc_lock)->rlock \| 0xffff8800b8a0d8a0: &(&ip->i_lock)->mr_lock \| 0xffff88009bf818a0: &(&ip->i_lock)->mr_lock \| 0xffff88004c66b8a0: &(&ip->i_lock)->mr_lock \| 0xffff8800bb6478a0: &(shost->host_lock)->rlock v3: fixed some problems Frederic pointed out * better rbtree tracking in dump_threads() * removed printf() and used pr_info() and pr_debug() Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> LKML-Reference: <1272863520-16179-1-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2010-05-09 13:45:24 +02:00
Tom Zanussi	454c407ec1	perf: add perf-inject builtin Currently, perf 'live mode' writes build-ids at the end of the session, which isn't actually useful for processing live mode events. What would be better would be to have the build-ids sent before any of the samples that reference them, which can be done by processing the event stream and retrieving the build-ids on the first hit. Doing that in perf-record itself, however, is off-limits. This patch introduces perf-inject, which does the same job while leaving perf-record untouched. Normal mode perf still records the build-ids at the end of the session as it should, but for live mode, perf-inject can be injected in between the record and report steps e.g.: perf record -o - ./hackbench 10 \| perf inject -v -b \| perf report -v -i - perf-inject reads a perf-record event stream and repipes it to stdout. At any point the processing code can inject other events into the event stream - in this case build-ids (-b option) are read and injected as needed into the event stream. Build-ids are just the first user of perf-inject - potentially anything that needs userspace processing to augment the trace stream with additional information could make use of this facility. Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frédéric Weisbecker <fweisbec@gmail.com> LKML-Reference: <1272696080-16435-3-git-send-email-tzanussi@gmail.com> Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2010-05-02 13:36:56 -03:00
Frederic Weisbecker	c61e52ee70	perf: Generalize perf lock's sample event reordering to the session layer The sample events recorded by perf record are not time ordered because we have one buffer per cpu for each event (even demultiplexed per task/per cpu for task bound events). But when we read trace events we want them to be ordered by time because many state machines are involved. There are currently two ways perf tools deal with that: - use -M to multiplex every buffers (perf sched, perf kmem) But this creates a lot of contention in SMP machines on record time. - use a post-processing time reordering (perf timechart, perf lock) The reordering used by timechart is simple but doesn't scale well with huge flow of events, in terms of performance and memory use (unusable with perf lock for example). Perf lock has its own samples reordering that flushes its memory use in a regular basis and that uses a sorting based on the previous event queued (a new event to be queued is close to the previous one most of the time). This patch proposes to export perf lock's samples reordering facility to the session layer that reads the events. So if a tool wants to get ordered sample events, it needs to set its struct perf_event_ops::ordered_samples to true and that's it. This prepares tracing based perf tools to get rid of the need to use buffers multiplexing (-M) or to implement their own reordering. Also lower the flush period to 2 as it's sufficient already. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Ingo Molnar <mingo@elte.hu> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Tom Zanussi <tzanussi@gmail.com>	2010-04-24 03:49:58 +02:00
Hitoshi Mitake	e4cef1f650	perf lock: Fix state machine to recognize lock sequence Previous state machine of perf lock was really broken. This patch improves it a little. This patch prepares the list of state machine that represents lock sequences for each threads. These state machines can be one of these sequences: 1) acquire -> acquired -> release 2) acquire -> contended -> acquired -> release 3) acquire (w/ try) -> release 4) acquire (w/ read) -> release The case of 4) is a little special. Double acquire of read lock is allowed, so the state machine counts read lock number, and permits double acquire and release. But, things are not so simple. Something in my model is still wrong. I counted the number of lock instances with bad sequence, and ratio is like this (case of tracing whoami): bad:233, total:2279 version 2: * threads are now identified with tid, not pid * prepared SEQ_STATE_READ_ACQUIRED for read lock. * bunch of struct lock_seq_stat is now linked list * debug information enhanced (this have to be removed someday) e.g. \| === output for debug=== \| \| bad:233, total:2279 \| bad rate:0.000000 \| histogram of events caused bad sequence \| acquire: 165 \| acquired: 0 \| contended: 0 \| release: 68 Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> LKML-Reference: <1271852634-9351-1-git-send-email-mitake@dcl.info.waseda.ac.jp> [rename SEQ_STATE_UNINITED to SEQ_STATE_UNINITIALIZED] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2010-04-24 03:23:14 +02:00
Ian Munsie	c055564217	perf: Fix endianness argument compatibility with OPT_BOOLEAN() and introduce OPT_INCR() Parsing an option from the command line with OPT_BOOLEAN on a bool data type would not work on a big-endian machine due to the manner in which the boolean was being cast into an int and incremented. For example, running 'perf probe --list' on a PowerPC machine would fail to properly set the list_events bool and would therefore print out the usage information and terminate. This patch makes OPT_BOOLEAN work as expected with a bool datatype. For cases where the original OPT_BOOLEAN was intentionally being used to increment an int each time it was passed in on the command line, this patch introduces OPT_INCR with the old behaviour of OPT_BOOLEAN (the verbose variable is currently the only such example of this). I have reviewed every use of OPT_BOOLEAN to verify that a true C99 bool was passed. Where integers were used, I verified that they were only being used for boolean logic and changed them to bools to ensure that they would not be mistakenly used as ints. The major exception was the verbose variable which now uses OPT_INCR instead of OPT_BOOLEAN. Signed-off-by: Ian Munsie <imunsie@au.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: <stable@kernel.org> # NOTE: wont apply to .3[34].x cleanly, please backport Cc: Git development list <git@vger.kernel.org> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric B Munson <ebmunson@us.ibm.com> Cc: Valdis.Kletnieks@vt.edu Cc: WANG Cong <amwang@redhat.com> Cc: Thiago Farina <tfransosi@gmail.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Mike Galbraith <efault@gmx.de> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: John Kacur <jkacur@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1271147857-11604-1-git-send-email-imunsie@au.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-04-14 11:26:44 +02:00
Frederic Weisbecker	b67577dfb4	perf lock: Drop the buffers multiplexing dependency We need to deal with time ordered events to build a correct state machine of lock events. This is why we multiplex the lock events buffers. But the ordering is done from the kernel, on the tracing fast path, leading to high contention between cpus. Without multiplexing, the events appears in a weak order. If we have four events, each split per cpu, perf record will read the events buffers in the following order: [ CPU0 ev0, CPU0 ev1, CPU0 ev3, CPU0 ev4, CPU1 ev0, CPU1 ev0....] To handle a post processing reordering, we could just read and sort the whole in memory, but it just doesn't scale with high amounts of events: lock events can fill huge amounts in few times. Basically we need to sort in memory and find a "grace period" point when we know that a given slice of previously sorted events can be committed for post-processing, so that we can unload the memory usage step by step and keep a scalable sorting list. There is no strong rules about how to define such "grace period". What does this patch is: We define a FLUSH_PERIOD value that defines a grace period in seconds. We want to have a slice of events covering 2 * FLUSH_PERIOD in our sorted list. If FLUSH_PERIOD is big enough, it ensures every events that occured in the first half of the timeslice have all been buffered and there are none remaining and there won't be further to put inside this first timeslice. Then once we reach the 2 * FLUSH_PERIOD timeslice, we flush the first half to be gentle with the memory (the second half can still get new events in the middle, so wait another period to flush it) FLUSH_PERIOD is defined to 5 seconds. Say the first event started on time t0. We can safely assume that at the time we are processing events of t0 + 10 seconds, ther won't be anymore events to read from perf.data that occured between t0 and t0 + 5 seconds. Hence we can safely flush the first half. To point out funky bugs, we have a guardian that checks a new event timestamp is not below the last event's timestamp flushed and that displays a warning in this case. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com>	2010-02-27 17:06:19 +01:00
Ingo Molnar	59f411b62c	perf lock: Clean up various details Fix up a few small stylistic details: - use consistent vertical spacing/alignment - remove line80 artifacts - group some global variables better - remove dead code Plus rename 'prof' to 'report' to make it more in line with other tools, and remove the line/file keying as we really want to use IPs like the other tools do. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1264851813-8413-12-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-01-31 09:08:27 +01:00
Hitoshi Mitake	9b5e350c7a	perf lock: Introduce new tool "perf lock", for analyzing lock statistics Adding new subcommand "perf lock" to perf. I have a lot of remaining ToDos, but for now perf lock can already provide minimal functionality for analyzing lock statistics. Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1264851813-8413-12-git-send-email-mitake@dcl.info.waseda.ac.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-01-31 09:08:26 +01:00

9 Commits