linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-03 15:06:40 +07:00

Author	SHA1	Message	Date
Theodore Ts'o	12706394bc	ext4: add tracepoint for ext4_journal_start This will help debug who is responsible for starting a jbd2 transaction. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-10 22:37:50 -04:00
Theodore Ts'o	4862fd6047	jbd2: remove jbd2_dev_to_name() from jbd2 tracepoints Using function calls in TP_printk causes perf heartburn, so print the MAJOR/MINOR device numbers instead. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-07-10 22:05:08 -04:00
Wu Fengguang	e1cbe23601	writeback: trace global_dirty_state Add trace event balance_dirty_state for showing the global dirty page counts and thresholds at each global_dirty_limits() invocation. This will cover the callers throttle_vm_writeout(), over_bground_thresh() and each balance_dirty_pages() loop. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-07-09 22:09:03 -07:00
Wu Fengguang	d46db3d582	writeback: make writeback_control.nr_to_write straight Pass struct wb_writeback_work all the way down to writeback_sb_inodes(), and initialize the struct writeback_control there. struct writeback_control is basically designed to control writeback of a single file, but we keep abuse it for writing multiple files in writeback_sb_inodes() and its callers. It immediately clean things up, e.g. suddenly wbc.nr_to_write vs work->nr_pages starts to make sense, and instead of saving and restoring pages_skipped in writeback_sb_inodes it can always start with a clean zero value. It also makes a neat IO pattern change: large dirty files are now written in the full 4MB writeback chunk size, rather than whatever remained quota in wbc->nr_to_write. Acked-by: Jan Kara <jack@suse.cz> Proposed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-07-09 22:09:01 -07:00
David S. Miller	e12fe68ce3	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2011-07-05 23:23:37 -07:00
Liam Girdwood	a82ce2ae0d	ASoC: core - Add platform IO tracing Trace platform IO just like CODEC IO. Signed-off-by: Liam Girdwood <lrg@ti.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>	2011-07-05 11:08:10 -07:00
Lukas Czerner	99cb1a318c	jbd: Add fixed tracepoints This commit adds fixed tracepoint for jbd. It has been based on fixed tracepoints for jbd2, however there are missing those for collecting statistics, since I think that it will require more intrusive patch so I should have its own commit, if someone decide that it is needed. Also there are new tracepoints in __journal_drop_transaction() and journal_update_superblock(). The list of jbd tracepoints: jbd_checkpoint jbd_start_commit jbd_commit_locking jbd_commit_flushing jbd_commit_logging jbd_drop_transaction jbd_end_commit jbd_do_submit_data jbd_cleanup_journal_tail jbd_update_superblock_end Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>	2011-06-25 17:29:51 +02:00
Lukas Czerner	785c4bcc0d	ext3: Add fixed tracepoints This commit adds fixed tracepoints to the ext3 code. It is based on ext4 tracepoints, however due to the differences of both file systems, there are some tracepoints missing (those for delaloc and for multi-block allocator) and there are some ext3 specific as well (for reservation windows). Here is a list: ext3_free_inode ext3_request_inode ext3_allocate_inode ext3_evict_inode ext3_drop_inode ext3_mark_inode_dirty ext3_write_begin ext3_ordered_write_end ext3_writeback_write_end ext3_journalled_write_end ext3_ordered_writepage ext3_writeback_writepage ext3_journalled_writepage ext3_readpage ext3_releasepage ext3_invalidatepage ext3_discard_blocks ext3_request_blocks ext3_allocate_blocks ext3_free_blocks ext3_sync_file_enter ext3_sync_file_exit ext3_sync_fs ext3_rsv_window_add ext3_discard_reservation ext3_alloc_new_reservation ext3_reserved ext3_forget ext3_read_block_bitmap ext3_direct_IO_enter ext3_direct_IO_exit ext3_unlink_enter ext3_unlink_exit ext3_truncate_enter ext3_truncate_exit ext3_get_blocks_enter ext3_get_blocks_exit ext3_load_inode Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>	2011-06-25 17:29:51 +02:00
Satoru Moriya	3847ce32ae	core: add tracepoints for queueing skb to rcvbuf This patch adds 2 tracepoints to get a status of a socket receive queue and related parameter. One tracepoint is added to sock_queue_rcv_skb. It records rcvbuf size and its usage. The other tracepoint is added to __sk_mem_schedule and it records limitations of memory for sockets and current usage. By using these tracepoints we're able to know detailed reason why kernel drop the packet. Signed-off-by: Satoru Moriya <satoru.moriya@hds.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-21 16:06:10 -07:00
Satoru Moriya	296f7ea75b	udp: add tracepoints for queueing skb to rcvbuf This patch adds a tracepoint to __udp_queue_rcv_skb to get the return value of ip_queue_rcv_skb. It indicates why kernel drops a packet at this point. ip_queue_rcv_skb returns following values in the packet drop case: rcvbuf is full : -ENOMEM sk_filter returns error : -EINVAL, -EACCESS, -ENOMEM, etc. __sk_mem_schedule returns error: -ENOBUF Signed-off-by: Satoru Moriya <satoru.moriya@hds.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-21 16:06:10 -07:00
Linus Torvalds	890879cfa0	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: Fix oops in jbd2_journal_remove_journal_head() jbd2: Remove obsolete parameters in the comments for some jbd2 functions ext4: fixed tracepoints cleanup ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap ext4: Fix max file size and logical block counting of extent format file ext4: correct comments for ext4_free_blocks()	2011-06-21 10:22:35 -07:00
Linus Torvalds	357ed6b1a1	Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: Move RCU_BOOST #ifdefs to header file rcu: use softirq instead of kthreads except when RCU_BOOST=y rcu: Use softirq to address performance regression rcu: Simplify curing of load woes	2011-06-19 08:56:56 -07:00
KOSAKI Motohiro	d7911ef30c	vmscan: implement swap token priority aging While testing for memcg aware swap token, I observed a swap token was often grabbed an intermittent running process (eg init, auditd) and they never release a token. Why? Some processes (eg init, auditd, audispd) wake up when a process exiting. And swap token can be get first page-in process when a process exiting makes no swap token owner. Thus such above intermittent running process often get a token. And currently, swap token priority is only decreased at page fault path. Then, if the process sleep immediately after to grab swap token, the swap token priority never be decreased. That's obviously undesirable. This patch implement very poor (and lightweight) priority aging. It only be affect to the above corner case and doesn't change swap tendency workload performance (eg multi process qsbench load) Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-15 20:03:59 -07:00
KOSAKI Motohiro	83cd81a343	vmscan: implement swap token trace This is useful for observing swap token activity. example output: zsh-1845 [000] 598.962716: update_swap_token_priority: mm=ffff88015eaf7700 old_prio=1 new_prio=0 memtoy-1830 [001] 602.033900: update_swap_token_priority: mm=ffff880037a45880 old_prio=947 new_prio=949 memtoy-1830 [000] 602.041509: update_swap_token_priority: mm=ffff880037a45880 old_prio=949 new_prio=951 memtoy-1830 [000] 602.051959: update_swap_token_priority: mm=ffff880037a45880 old_prio=951 new_prio=953 memtoy-1830 [000] 602.052188: update_swap_token_priority: mm=ffff880037a45880 old_prio=953 new_prio=955 memtoy-1830 [001] 602.427184: put_swap_token: token_mm=ffff880037a45880 zsh-1789 [000] 602.427281: replace_swap_token: old_token_mm= (null) old_prio=0 new_token_mm=ffff88015eaf7018 new_prio=2 zsh-1789 [001] 602.433456: update_swap_token_priority: mm=ffff88015eaf7018 old_prio=2 new_prio=4 zsh-1789 [000] 602.437613: update_swap_token_priority: mm=ffff88015eaf7018 old_prio=4 new_prio=6 zsh-1789 [000] 602.443924: update_swap_token_priority: mm=ffff88015eaf7018 old_prio=6 new_prio=8 zsh-1789 [000] 602.451873: update_swap_token_priority: mm=ffff88015eaf7018 old_prio=8 new_prio=10 zsh-1789 [001] 602.462639: update_swap_token_priority: mm=ffff88015eaf7018 old_prio=10 new_prio=12 Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Rik van Riel<riel@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-06-15 20:03:59 -07:00
Shaohua Li	09223371de	rcu: Use softirq to address performance regression Commit a26ac2455ffcf3(rcu: move TREE_RCU from softirq to kthread) introduced performance regression. In an AIM7 test, this commit degraded performance by about 40%. The commit runs rcu callbacks in a kthread instead of softirq. We observed high rate of context switch which is caused by this. Out test system has 64 CPUs and HZ is 1000, so we saw more than 64k context switch per second which is caused by RCU's per-CPU kthread. A trace showed that most of the time the RCU per-CPU kthread doesn't actually handle any callbacks, but instead just does a very small amount of work handling grace periods. This means that RCU's per-CPU kthreads are making the scheduler do quite a bit of work in order to allow a very small amount of RCU-related processing to be done. Alex Shi's analysis determined that this slowdown is due to lock contention within the scheduler. Unfortunately, as Peter Zijlstra points out, the scheduler's real-time semantics require global action, which means that this contention is inherent in real-time scheduling. (Yes, perhaps someone will come up with a workaround -- otherwise, -rt is not going to do well on large SMP systems -- but this patch will work around this issue in the meantime. And "the meantime" might well be forever.) This patch therefore re-introduces softirq processing to RCU, but only for core RCU work. RCU callbacks are still executed in kthread context, so that only a small amount of RCU work runs in softirq context in the common case. This should minimize ksoftirqd execution, allowing us to skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Tested-by: "Alex,Shi" <alex.shi@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-06-14 15:25:39 -07:00
Wu Fengguang	e84d0a4f8e	writeback: trace event writeback_queue_io Note that it adds a little overheads to account the moved/enqueued inodes from b_dirty to b_io. The "moved" accounting may be later used to limit the number of inodes that can be moved in one shot, in order to keep spinlock hold time under control. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-06-08 08:25:23 +08:00
Wu Fengguang	251d6a471c	writeback: trace event writeback_single_inode It is valuable to know how the dirty inodes are iterated and their IO size. "writeback_single_inode: bdi 8:0: ino=134246746 state=I_DIRTY_SYNC\|I_SYNC age=414 index=0 to_write=1024 wrote=0" - "state" reflects inode->i_state at the end of writeback_single_inode() - "index" reflects mapping->writeback_index after the ->writepages() call - "to_write" is the wbc->nr_to_write at entrance of writeback_single_inode() - "wrote" is the number of pages actually written v2: add trace event writeback_single_inode_requeue as proposed by Dave. CC: Dave Chinner <david@fromorbit.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-06-08 08:25:23 +08:00
Wu Fengguang	846d5a091b	writeback: remove .nonblocking and .encountered_congestion Remove two unused struct writeback_control fields: .encountered_congestion (completely unused) .nonblocking (never set, checked/showed in XFS,NFS/btrfs) The .for_background check in nfs_write_inode() is also removed btw, as .for_background implies WB_SYNC_NONE. Reviewed-by: Jan Kara <jack@suse.cz> Proposed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-06-08 08:25:23 +08:00
Wu Fengguang	b7a2441f99	writeback: remove writeback_control.more_io When wbc.more_io was first introduced, it indicates whether there are at least one superblock whose s_more_io contains more IO work. Now with the per-bdi writeback, it can be replaced with a simple b_more_io test. Acked-by: Jan Kara <jack@suse.cz> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-06-08 08:25:23 +08:00
Lukas Czerner	a9c667f8f0	ext4: fixed tracepoints cleanup While creating fixed tracepoints for ext3, basically by porting them from ext4, I found a lot of useless retyping, wrong type usage, useless variable passing and other inconsistencies in the ext4 fixed tracepoint code. This patch cleans the fixed tracepoint code for ext4 and also simplify some of them. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-06-06 09:51:52 -04:00
Linus Torvalds	0e833d8cfc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits) tg3: Fix tg3_skb_error_unmap() net: tracepoint of net_dev_xmit sees freed skb and causes panic drivers/net/can/flexcan.c: add missing clk_put net: dm9000: Get the chip in a known good state before enabling interrupts drivers/net/davinci_emac.c: add missing clk_put af-packet: Add flag to distinguish VID 0 from no-vlan. caif: Fix race when conditionally taking rtnl lock usbnet/cdc_ncm: add missing .reset_resume hook vlan: fix typo in vlan_dev_hard_start_xmit() net/ipv4: Check for mistakenly passed in non-IPv4 address iwl4965: correctly validate temperature value bluetooth l2cap: fix locking in l2cap_global_chan_by_psm ath9k: fix two more bugs in tx power cfg80211: don't drop p2p probe responses Revert "net: fix section mismatches" drivers/net/usb/catc.c: Fix potential deadlock in catc_ctrl_run() sctp: stop pending timers and purge queues when peer restart asoc drivers/net: ks8842 Fix crash on received packet when in PIO mode. ip_options_compile: properly handle unaligned pointer iwlagn: fix incorrect PCI subsystem id for 6150 devices ...	2011-06-04 23:16:00 +09:00
Koki Sanagi	ec764bf083	net: tracepoint of net_dev_xmit sees freed skb and causes panic Because there is a possibility that skb is kfree_skb()ed and zero cleared after ndo_start_xmit, we should not see the contents of skb like skb->len and skb->dev->name after ndo_start_xmit. But trace_net_dev_xmit does that and causes panic by NULL pointer dereference. This patch fixes trace_net_dev_xmit not to see the contents of skb directly. If you want to reproduce this panic, 1. Get tracepoint of net_dev_xmit on 2. Create 2 guests on KVM 2. Make 2 guests use virtio_net 4. Execute netperf from one to another for a long time as a network burden 5. host will panic(It takes about 30 minutes) Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-02 14:06:31 -07:00
Linus Torvalds	c4a227d89f	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits) perf: Fix SIGIO handling perf top: Don't stop if no kernel symtab is found perf top: Handle kptr_restrict perf top: Remove unused macro perf events: initialize fd array to -1 instead of 0 perf tools: Make sure kptr_restrict warnings fit 80 col terms perf tools: Fix build on older systems perf symbols: Handle /proc/sys/kernel/kptr_restrict perf: Remove duplicate headers ftrace: Add internal recursive checks tracing: Update btrfs's tracepoints to use u64 interface tracing: Add __print_symbolic_u64 to avoid warnings on 32bit machine ftrace: Set ops->flag to enabled even on static function tracing tracing: Have event with function tracer check error return ftrace: Have ftrace_startup() return failure code jump_label: Check entries limit in __jump_label_update ftrace/recordmcount: Avoid STT_FUNC symbols as base on ARM scripts/tags.sh: Add magic for trace-events for etags too scripts/tags.sh: Fix ctags for DEFINE_EVENT() x86/ftrace: Fix compiler warning in ftrace.c ...	2011-05-28 12:55:55 -07:00
Linus Torvalds	8b29336fe0	Merge branch 'gpio/next' of git://git.secretlab.ca/git/linux-2.6 * 'gpio/next' of git://git.secretlab.ca/git/linux-2.6: gpio/via: rename VIA local config struct basic_mmio_gpio: split into a gpio library and platform device gpio: remove some legacy comments in build files gpio: add trace events for setting direction and value gpio/pca953x: Use handle_simple_irq instead of handle_edge_irq gpiolib: export gpiochip_find gpio: remove redundant Kconfig depends on GPIOLIB basic_mmio_gpio: convert to non-__raw* accessors basic_mmio_gpio: support direction registers basic_mmio_gpio: support different input/output registers basic_mmio_gpio: detect output method at probe time basic_mmio_gpio: request register regions basic_mmio_gpio: allow overriding number of gpio basic_mmio_gpio: convert to platform_{get,set}_drvdata() basic_mmio_gpio: remove runtime width/endianness evaluation	2011-05-26 12:14:41 -07:00
liubo	7f34b746f7	tracing: Update btrfs's tracepoints to use u64 interface To avoid 64->32 truncating WARNING, update btrfs's tracepoints. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Link: http://lkml.kernel.org/r/4DACE6E3.8080200@cn.fujitsu.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-05-25 22:13:47 -04:00
liubo	2fc1b6f0d0	tracing: Add __print_symbolic_u64 to avoid warnings on 32bit machine Filesystem, like Btrfs, has some "ULL" macros, and when these macros are passed to tracepoints'__print_symbolic(), there will be 64->32 truncate WARNINGS during compiling on 32bit box. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Link: http://lkml.kernel.org/r/4DACE6E0.7000507@cn.fujitsu.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-05-25 22:13:44 -04:00
Uwe Kleine-König	3f397c2144	gpio: add trace events for setting direction and value This patch allows to trace gpio operations using ftrace Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2011-05-20 00:40:19 -06:00
Linus Torvalds	eb04f2f04e	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (78 commits) Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" net,rcu: convert call_rcu(prl_entry_destroy_rcu) to kfree batman,rcu: convert call_rcu(softif_neigh_free_rcu) to kfree_rcu batman,rcu: convert call_rcu(neigh_node_free_rcu) to kfree() batman,rcu: convert call_rcu(gw_node_free_rcu) to kfree_rcu net,rcu: convert call_rcu(kfree_tid_tx) to kfree_rcu() net,rcu: convert call_rcu(xt_osf_finger_free_rcu) to kfree_rcu() net/mac80211,rcu: convert call_rcu(work_free_rcu) to kfree_rcu() net,rcu: convert call_rcu(wq_free_rcu) to kfree_rcu() net,rcu: convert call_rcu(phonet_device_rcu_free) to kfree_rcu() perf,rcu: convert call_rcu(swevent_hlist_release_rcu) to kfree_rcu() perf,rcu: convert call_rcu(free_ctx) to kfree_rcu() net,rcu: convert call_rcu(__nf_ct_ext_free_rcu) to kfree_rcu() net,rcu: convert call_rcu(net_generic_release) to kfree_rcu() net,rcu: convert call_rcu(netlbl_unlhsh_free_addr6) to kfree_rcu() net,rcu: convert call_rcu(netlbl_unlhsh_free_addr4) to kfree_rcu() security,rcu: convert call_rcu(sel_netif_free) to kfree_rcu() net,rcu: convert call_rcu(xps_dev_maps_release) to kfree_rcu() net,rcu: convert call_rcu(xps_map_release) to kfree_rcu() net,rcu: convert call_rcu(rps_map_release) to kfree_rcu() ...	2011-05-19 18:14:34 -07:00
Mel Gorman	1d929b7a84	mm: tracing: add missing GFP flags to tracing include/linux/gfp.h and include/trace/events/gfpflags.h are out of sync. When tracing is enabled, certain flags are not recognised and the text output is less useful as a result. Add the missing flags. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-11 18:50:45 -07:00
Paul E. McKenney	a26ac2455f	rcu: move TREE_RCU from softirq to kthread If RCU priority boosting is to be meaningful, callback invocation must be boosted in addition to preempted RCU readers. Otherwise, in presence of CPU real-time threads, the grace period ends, but the callbacks don't get invoked. If the callbacks don't get invoked, the associated memory doesn't get freed, so the system is still subject to OOM. But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit moves the callback invocations to a kthread, which can be boosted easily. Also add comments and properly synchronized all accesses to rcu_cpu_kthread_task, as suggested by Lai Jiangshan. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2011-05-05 23:16:54 -07:00
Jens Axboe	49cac01e1f	block: make unplug timer trace event correspond to the schedule() unplug It's a pretty close match to what we had before - the timer triggering would mean that nobody unplugged the plug in due time, in the new scheme this matches very closely what the schedule() unplug now is. It's essentially the difference between an explicit unplug (IO unplug) or an implicit unplug (timer unplug, we scheduled with pending IO queued). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-04-16 13:51:05 +02:00
Jens Axboe	94b5eb28b4	block: fixup block IO unplug trace call It was removed with the on-stack plugging, readd it and track the depth of requests added when flushing the plug. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-04-12 10:12:19 +02:00
Jens Axboe	d9c9783317	block: remove block_unplug_timer() trace point We no longer have an unplug timer running, so no point in keeping the trace point. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-04-12 10:06:33 +02:00
Linus Torvalds	212a17ab87	Merge branch 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (45 commits) Btrfs: fix __btrfs_map_block on 32 bit machines btrfs: fix possible deadlock by clearing __GFP_FS flag btrfs: check link counter overflow in link(2) btrfs: don't mess with i_nlink of unlocked inode in rename() Btrfs: check return value of btrfs_alloc_path() Btrfs: fix OOPS of empty filesystem after balance Btrfs: fix memory leak of empty filesystem after balance Btrfs: fix return value of setflags ioctl Btrfs: fix uncheck memory allocations btrfs: make inode ref log recovery faster Btrfs: add btrfs_trim_fs() to handle FITRIM Btrfs: adjust btrfs_discard_extent() return errors and trimmed bytes Btrfs: make btrfs_map_block() return entire free extent for each device of RAID0/1/10/DUP Btrfs: make update_reserved_bytes() public btrfs: return EXDEV when linking from different subvolumes Btrfs: Per file/directory controls for COW and compression Btrfs: add datacow flag in inode flag btrfs: use GFP_NOFS instead of GFP_KERNEL Btrfs: check return value of read_tree_block() btrfs: properly access unaligned checksum buffer ... Fix up trivial conflicts in fs/btrfs/volumes.c due to plug removal in the block layer.	2011-03-28 15:31:05 -07:00
liubo	1abe9b8a13	Btrfs: add initial tracepoint support for btrfs Tracepoints can provide insight into why btrfs hits bugs and be greatly helpful for debugging, e.g dd-7822 [000] 2121.641088: btrfs_inode_request: root = 5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, logged_trans = 0 dd-7822 [000] 2121.641100: btrfs_inode_new: root = 5(FS_TREE), gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0 btrfs-transacti-7804 [001] 2146.935420: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 29388800 (cow_level = 0) btrfs-transacti-7804 [001] 2146.935473: btrfs_cow_block: root = 1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 29392896 (cow_level = 0) btrfs-transacti-7804 [001] 2146.972221: btrfs_transaction_commit: root = 1(ROOT_TREE), gen = 8 flush-btrfs-2-7821 [001] 2155.824210: btrfs_chunk_alloc: root = 3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, sub_stripes = 0, type = DATA flush-btrfs-2-7821 [001] 2155.824241: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 29396992 (cow_level = 0) flush-btrfs-2-7821 [001] 2155.824255: btrfs_cow_block: root = 4(DEV_TREE), refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 0) flush-btrfs-2-7821 [000] 2155.824329: btrfs_cow_block: root = 3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 20975616 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898019: btrfs_cow_block: root = 5(FS_TREE), refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898043: btrfs_cow_block: root = 7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = 29409280 (cow_level = 0) Here is what I have added: 1) ordere_extent: btrfs_ordered_extent_add btrfs_ordered_extent_remove btrfs_ordered_extent_start btrfs_ordered_extent_put These provide critical information to understand how ordered_extents are updated. 2) extent_map: btrfs_get_extent extent_map is used in both read and write cases, and it is useful for tracking how btrfs specific IO is running. 3) writepage: __extent_writepage btrfs_writepage_end_io_hook Pages are cirtical resourses and produce a lot of corner cases during writeback, so it is valuable to know how page is written to disk. 4) inode: btrfs_inode_new btrfs_inode_request btrfs_inode_evict These can show where and when a inode is created, when a inode is evicted. 5) sync: btrfs_sync_file btrfs_sync_fs These show sync arguments. 6) transaction: btrfs_transaction_commit In transaction based filesystem, it will be useful to know the generation and who does commit. 7) back reference and cow: btrfs_delayed_tree_ref btrfs_delayed_data_ref btrfs_delayed_ref_head btrfs_cow_block Btrfs natively supports back references, these tracepoints are helpful on understanding btrfs's COW mechanism. 8) chunk: btrfs_chunk_alloc btrfs_chunk_free Chunk is a link between physical offset and logical offset, and stands for space infomation in btrfs, and these are helpful on tracing space things. 9) reserved_extent: btrfs_reserved_extent_alloc btrfs_reserved_extent_free These can show how btrfs uses its space. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-03-28 05:37:33 -04:00
Linus Torvalds	ae005cbed1	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (43 commits) ext4: fix a BUG in mb_mark_used during trim. ext4: unused variables cleanup in fs/ext4/extents.c ext4: remove redundant set_buffer_mapped() in ext4_da_get_block_prep() ext4: add more tracepoints and use dev_t in the trace buffer ext4: don't kfree uninitialized s_group_info members ext4: add missing space in printk's in __ext4_grp_locked_error() ext4: add FITRIM to compat_ioctl. ext4: handle errors in ext4_clear_blocks() ext4: unify the ext4_handle_release_buffer() api ext4: handle errors in ext4_rename jbd2: add COW fields to struct jbd2_journal_handle jbd2: add the b_cow_tid field to journal_head struct ext4: Initialize fsync transaction ids in ext4_new_inode() ext4: Use single thread to perform DIO unwritten convertion ext4: optimize ext4_bio_write_page() when no extent conversion is needed ext4: skip orphan cleanup if fs has unknown ROCOMPAT features ext4: use the nblocks arg to ext4_truncate_restart_trans() ext4: fix missing iput of root inode for some mount error paths ext4: make FIEMAP and delayed allocation play well together ext4: suppress verbose debugging information if malloc-debug is off ... Fi up conflicts in fs/ext4/super.c due to workqueue changes	2011-03-25 09:57:41 -07:00
Jiaying Zhang	0562e0bad4	ext4: add more tracepoints and use dev_t in the trace buffer - Add more ext4 tracepoints. - Change ext4 tracepoints to use dev_t field with MAJOR/MINOR macros so that we can save 4 bytes in the ring buffer on some platforms. - Add sync_mode to ext4_da_writepages, ext4_da_write_pages, and ext4_da_writepages_result tracepoints. Also remove for_reclaim field from ext4_da_writepages since it is usually not very useful. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-03-21 21:38:05 -04:00
Linus Torvalds	d3e458d781	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (308 commits) ALSA: sound/pci/asihpi: check adapter index in hpi_ioctl ALSA: aloop - Fix possible IRQ lock inversion ALSA: sound/core: merge list_del()/list_add_tail() to list_move_tail() ALSA: ctxfi - use list_move() instead of list_del()/list_add() combination ALSA: firewire - msleep needs delay.h ALSA: firewire-lib, firewire-speakers: handle packet queueing errors ALSA: firewire-lib: allocate DMA buffer separately ALSA: firewire-lib: use no-info SYT for packets without SYT sample ALSA: add LaCie FireWire Speakers/Griffin FireWave Surround driver ALSA: hda - Remove an unused variable in patch_realtek.c ALSA: hda - pin-adc-mux-dmic auto-configuration of 92HD8X codecs ALSA: hda - fix digital mic selection in mixer on 92HD8X codecs ALSA: hda - Move default input-src selection to init part ALSA: hda - Initialize special cases for input src in init phase ALSA: ctxfi - Clear input settings before initialization ALSA: ctxfi - Fix SPDIF status retrieval ALSA: ctxfi - Fix incorrect SPDIF status bit mask ALSA: ctxfi - Fix microphone boost codes/comments ALSA: atiixp - Fix wrong time-out checks during ac-link reset ALSA: intel8x0m: append 'm' to "r_intel8x0" ...	2011-03-18 10:46:37 -07:00
Linus Torvalds	c55d267de2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (170 commits) [SCSI] scsi_dh_rdac: Add MD36xxf into device list [SCSI] scsi_debug: add consecutive medium errors [SCSI] libsas: fix ata list corruption issue [SCSI] hpsa: export resettable host attribute [SCSI] hpsa: move device attributes to avoid forward declarations [SCSI] scsi_debug: Logical Block Provisioning (SBC3r26) [SCSI] sd: Logical Block Provisioning update [SCSI] Include protection operation in SCSI command trace [SCSI] hpsa: fix incorrect PCI IDs and add two new ones (2nd try) [SCSI] target: Fix volume size misreporting for volumes > 2TB [SCSI] bnx2fc: Broadcom FCoE offload driver [SCSI] fcoe: fix broken fcoe interface reset [SCSI] fcoe: precedence bug in fcoe_filter_frames() [SCSI] libfcoe: Remove stale fcoe-netdev entries [SCSI] libfcoe: Move FCOE_MTU definition from fcoe.h to libfcoe.h [SCSI] libfc: introduce __fc_fill_fc_hdr that accepts fc_hdr as an argument [SCSI] fcoe, libfc: initialize EM anchors list and then update npiv EMs [SCSI] Revert "[SCSI] libfc: fix exchange being deleted when the abort itself is timed out" [SCSI] libfc: Fixing a memory leak when destroying an interface [SCSI] megaraid_sas: Version and Changelog update ... Fix up trivial conflicts due to whitespace differences in drivers/scsi/libsas/{sas_ata.c,sas_scsi_host.c}	2011-03-17 17:54:40 -07:00
Linus Torvalds	f74b944419	Merge branch 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: BKL: That's all, folks fs/locks.c: Remove stale FIXME left over from BKL conversion ipx: remove the BKL appletalk: remove the BKL x25: remove the BKL ufs: remove the BKL hpfs: remove the BKL drivers: remove extraneous includes of smp_lock.h tracing: don't trace the BKL adfs: remove the big kernel lock	2011-03-16 17:21:00 -07:00
Takashi Iwai	27b92d4ff2	Merge branch 'topic/asoc' into for-linus	2011-03-16 17:38:41 +01:00
Martin K. Petersen	72f7d322fd	[SCSI] Include protection operation in SCSI command trace When debugging DIF/DIX it is very helpful to be able to see which DIX operation is associated with the scsi_cmnd. Include the protection op in the SCSI command trace. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2011-03-14 18:36:02 -05:00
David Sharp	ca9da2dd63	tracing: Fix event alignment: skb:kfree_skb Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David Sharp <dhsharp@google.com> LKML-Reference: <1291421609-14665-10-git-send-email-dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-03-10 10:34:31 -05:00
David Sharp	ad440ad66f	tracing: Fix event alignment: mce:mce_record Signed-off-by: David Sharp <dhsharp@google.com> LKML-Reference: <1291421609-14665-9-git-send-email-dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-03-10 10:34:28 -05:00
David Sharp	b5e3008e48	tracing: Fix event alignment: module:module_request Acked-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: David Sharp <dhsharp@google.com> LKML-Reference: <1291421609-14665-7-git-send-email-dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-03-10 10:34:20 -05:00
Tao Ma	2d3a8497f8	blktrace: Remove blk_fill_rwbs_rq. If we enable trace events to trace block actions, We use blk_fill_rwbs_rq to analyze the corresponding actions in request's cmd_flags, but we only choose the minor 2 bits from it, so most of other flags(e.g, REQ_SYNC) are missing. For example, with a sync write we get: write_test-2409 [001] 160.013869: block_rq_insert: 3,64 W 0 () 258135 + = 8 [write_test] Since now we have integrated the flags of both bio and request, it is safe to pass rq->cmd_flags directly to blk_fill_rwbs and blk_fill_rwbs_rq isn't needed any more. With this patch, after a sync write we get: write_test-2417 [000] 226.603878: block_rq_insert: 3,64 WS 0 () 258135 += 8 [write_test] Signed-off-by: Tao Ma <boyu.mt@taobao.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-03 10:53:20 -05:00
Arnd Bergmann	f51b452bed	tracing: don't trace the BKL No reason to trace it when the last user is gone. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Frederic Weisbecker <fweisbec@gmail.com>	2011-03-02 00:02:39 +01:00
Steven Rostedt	e4a9ea5ee7	tracing: Replace trace_event struct array with pointer array Currently the trace_event structures are placed in the _ftrace_events section, and at link time, the linker makes one large array of all the trace_event structures. On boot up, this array is read (much like the initcall sections) and the events are processed. The problem is that there is no guarantee that gcc will place complex structures nicely together in an array format. Two structures in the same file may be placed awkwardly, because gcc has no clue that they are suppose to be in an array. A hack was used previous to force the alignment to 4, to pack the structures together. But this caused alignment issues with other architectures (sparc). Instead of packing the structures into an array, the structures' addresses are now put into the _ftrace_event section. As pointers are always the natural alignment, gcc should always pack them tightly together (otherwise initcall, extable, etc would also fail). By having the pointers to the structures in the section, we can still iterate the trace_events without causing unnecessary alignment problems with other architectures, or depending on the current behaviour of gcc that will likely change in the future just to tick us kernel developers off a little more. The _ftrace_event section is also moved into the .init.data section as it is now only needed at boot up. Suggested-by: David Miller <davem@davemloft.net> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-02-02 21:37:13 -05:00
Dimitris Papastamos	c358e640a6	ASoC: soc-cache: Add trace event for snd_soc_cache_sync() This patch makes it easy to see when the syncing process begins and ends. You can also enable the snd_soc_reg_write tracepoint to see which registers are being synced. Signed-off-by: Dimitris Papastamos <dp@opensource.wolfsonmicro.com> Acked-by: Liam Girdwood <lrg@slimlogic.co.uk> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>	2011-01-21 18:30:51 +00:00
Linus Torvalds	f9ee7f60d6	Merge branches 'core-fixes-for-linus', 'x86-fixes-for-linus', 'timers-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: avoid pointless blocked-task warnings rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status rtmutex: Fix comment about why new_owner can be NULL in wake_futex_pi() * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, olpc: Add missing Kconfig dependencies x86, mrst: Set correct APB timer IRQ affinity for secondary cpu x86: tsc: Fix calibration refinement conditionals to avoid divide by zero x86, ia64, acpi: Clean up x86-ism in drivers/acpi/numa.c * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timekeeping: Make local variables static time: Rename misnamed minsec argument of clocks_calc_mult_shift() * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: Remove syscall_exit_fields tracing: Only process module tracepoints once perf record: Add "nodelay" mode, disabled by default perf sched: Fix list of events, dropping unsupported ':r' modifier Revert "perf tools: Emit clearer message for sys_perf_event_open ENOENT return" perf top: Fix annotate segv perf evsel: Fix order of event list deletion	2011-01-15 12:45:00 -08:00
Steven Rostedt	c94fbe1d9e	tracing: Only process module tracepoints once The commit: 9f987b3141f086de27832514aad9f50a53f754 tracing: Include module.h in define_trace.h only solved half the problem. If the trace/events/module.h header is included at the time of define_trace.h (or in ftrace.h within it), the module.h TRACE_SYSTEM will override the current TRACE_SYSTEM macro. Since define_trace.h is included when CREATE_TRACE_POINTS is set, and the first thing it does is to #undef CREATE_TRACE_POINTS, by placing the module.h TRACE_SYSTEM inside a #ifdef CREATE_TRACE_POINTS we can prevent it from overriding the TRACE_SYSTEM that is being processed, and still process the module.h tracepoints when the module code defines CREATE_TRACE_POINTS and includes the trace/events/module.h header. As with commit 9f987b3141, this is only an issue if module.h is not included before the trace/events/<event>.h file is included, which (luckily) has not happened yet. Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-01-14 11:25:58 -05:00
Mel Gorman	f3a310bc4e	mm: vmscan: rename lumpy_mode to reclaim_mode With compaction being used instead of lumpy reclaim, the name lumpy_mode and associated variables is a bit misleading. Rename lumpy_mode to reclaim_mode which is a better fit. There is no functional change. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:34 -08:00
Mel Gorman	ee64fc9354	mm: vmscan: convert lumpy_mode into a bitmask Currently lumpy_mode is an enum and determines if lumpy reclaim is off, syncronous or asyncronous. In preparation for using compaction instead of lumpy reclaim, this patch converts the flags into a bitmap. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:33 -08:00
Mel Gorman	b7aba6984d	mm: compaction: add trace events for memory compaction activity In preparation for a patches promoting the use of memory compaction over lumpy reclaim, this patch adds trace points for memory compaction activity. Using them, we can monitor the scanning activity of the migration and free page scanners as well as the number and success rates of pages passed to page migration. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:33 -08:00
Wu Fengguang	71927e84e0	writeback: trace wakeup event for background writeback This tracks when balance_dirty_pages() tries to wakeup the flusher thread for background writeback (if it was not started already). Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Engelhardt <jengelh@medozas.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:32 -08:00
Linus Torvalds	275220f0fc	Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits) block: ensure that completion error gets properly traced blktrace: add missing probe argument to block_bio_complete block cfq: don't use atomic_t for cfq_group block cfq: don't use atomic_t for cfq_queue block: trace event block fix unassigned field block: add internal hd part table references block: fix accounting bug on cross partition merges kref: add kref_test_and_get bio-integrity: mark kintegrityd_wq highpri and CPU intensive block: make kblockd_workqueue smarter Revert "sd: implement sd_check_events()" block: Clean up exit_io_context() source code. Fix compile warnings due to missing removal of a 'ret' variable fs/block: type signature of major_to_index(int) to major_to_index(unsigned) block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p) cfq-iosched: don't check cfqg in choose_service_tree() fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors cdrom: export cdrom_check_events() sd: implement sd_check_events() sr: implement sr_check_events() ...	2011-01-13 10:45:01 -08:00
Linus Torvalds	66dc918d42	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (348 commits) ALSA: hda - Fix NULL-derefence with a single mic in STAC auto-mic detection ALSA: hda - Add missing NID 0x19 fixup for Sony VAIO ALSA: hda - Fix ALC275 enable hardware EQ for SONY VAIO ALSA: oxygen: fix Xonar DG input ALSA: hda - Fix EAPD on Lenovo NB ALC269 to low ALSA: hda - Fix missing EAPD for Acer 4930G ALSA: hda: Disable 4/6 channels on some NVIDIA GPUs. ALSA: hda - Add static_hdmi_pcm option to HDMI codec parser ALSA: hda - Don't refer ELD when unplugged ASoC: tpa6130a2: Fix compiler warning ASoC: tlv320dac33: Add DAPM selection for LOM invert ASoC: DMIC codec: Adding a generic DMIC codec ALSA: snd-usb-us122l: Fix missing NULL checks ALSA: snd-usb-us122l: Fix MIDI output ASoC: soc-cache: Fix invalid memory access during snd_soc_lzo_cache_sync() ASoC: Fix section mismatch in wm8995.c ALSA: oxygen: add S/PDIF source selection for Claro cards ALSA: oxygen: fix CD/MIDI for X-Meridian (2G) ASoC: fix migor audio build ALSA: include delay.h for msleep in Xonar DG support ...	2011-01-13 10:32:54 -08:00
Linus Torvalds	1896a1346a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6: (45 commits) regulator: missing index in PTR_ERR() in isl6271a_probe() regulator: Assign return value of mc13xxx_reg_rmw to ret regulator: Add initial per-regulator debugfs support regulator: Make regulator_has_full_constraints a bool regulator: Clean up logging a bit regulator: Optimise out noop voltage changes regulator: Add API to re-apply voltage to hardware regulator: Staticise non-exported functions in mc13892 regulator: Only notify voltage changes when they succeed regulator: Provide a selector based set_voltage_sel() operation regulator: Factor out voltage set operation into a separate function regulator: Convert WM8994 to use get_voltage_sel() regulator: Convert WM835x to use get_voltage_sel() regulator: Allow modular build of mc13xxx-core regulator: support PMIC mc13892 make mc13783 regulator code generic Change the register name definitions for mc13783 mach-ux500: Updated and connected ab8500 regulator board configuration regulators: Removed macros for initialization of ab8500 regulators regulators: Added verbose debug messages to ab8500 regulators ...	2011-01-13 10:24:07 -08:00
Linus Torvalds	55065bc527	Merge branch 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits) KVM: Initialize fpu state in preemptible context KVM: VMX: when entering real mode align segment base to 16 bytes KVM: MMU: handle 'map_writable' in set_spte() function KVM: MMU: audit: allow audit more guests at the same time KVM: Fetch guest cr3 from hardware on demand KVM: Replace reads of vcpu->arch.cr3 by an accessor KVM: MMU: only write protect mappings at pagetable level KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear() KVM: MMU: Initialize base_role for tdp mmus KVM: VMX: Optimize atomic EFER load KVM: VMX: Add definitions for more vm entry/exit control bits KVM: SVM: copy instruction bytes from VMCB KVM: SVM: implement enhanced INVLPG intercept KVM: SVM: enhance mov DR intercept handler KVM: SVM: enhance MOV CR intercept handler KVM: SVM: add new SVM feature bit names KVM: cleanup emulate_instruction KVM: move complete_insn_gp() into x86.c KVM: x86: fix CR8 handling KVM guest: Fix kvm clock initialization when it's configured out ...	2011-01-13 10:14:24 -08:00
Mark Brown	02fa3ec01a	regulator: Add basic trace facilities Provide some basic trace facilities to the regulator API. We generate events on regulator enable, disable and voltage setting over the actual hardware operations (which are assumed to be the expensive ones which require interaction with the actual device). This is intended to facilitate debug of the performance and behaviour with consumers allowing unified traces to be generated including the regulator operations within the context of the other components of the system. For enable we log the explicit delay for the voltage ramp separately to the interaction with the hardware to highlight the time consumed in I/O. We should add a similar delay for voltage changes, though there the relatively small magnitude of the changes in the context of the I/O costs makes it much less critical for most regulators. Only hardware interactions are currently traced as the primary focus is on the performance and synchronisation of actual hardware interactions. Additional tracepoints for debugging of the logical operations can be added later if required. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>	2011-01-12 14:33:00 +00:00
Xiao Guangrong	0730388b97	KVM: cleanup async_pf tracepoints Use 'DECLARE_EVENT_CLASS' to cleanup async_pf tracepoints Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:28:57 +02:00
Xiao Guangrong	c9b263d2be	KVM: fix tracing kvm_try_async_get_page Tracing 'async' and pfn is useless, since 'async' is always true, and 'pfn' is always "fault_pfn' We can trace 'gva' and 'gfn' instead, it can help us to see the life-cycle of an async_pf Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:28:56 +02:00
Gleb Natapov	64be500706	KVM: x86: trace "exit to userspace" event Add tracepoint for userspace exit. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:28:44 +02:00
Gleb Natapov	7c90705bf2	KVM: Inject asynchronous page fault into a PV guest if page is swapped out. Send async page fault to a PV guest if it accesses swapped out memory. Guest will choose another task to run upon receiving the fault. Allow async page fault injection only when guest is in user mode since otherwise guest may be in non-sleepable context and will not be able to reschedule. Vcpu will be halted if guest will fault on the same page again or if vcpu executes kernel code. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:17 +02:00
Gleb Natapov	af585b921e	KVM: Halt vcpu if page it tries to access is swapped out If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. [avi: remove call to get_user_pages_noio(), nacked by Linus; this makes everything synchrnous again] Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:21:39 +02:00
Mathieu Desnoyers	881fe4bdcd	tracing: remove duplicate null-pointer check in skb tracepoint The check for NULL skb in the kfree_skb trace event is a duplicate from the check already done in its only caller, kfree_skb(). Remove this duplicate check. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20110106175319.GA30610@Krystal> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: David S. Miller <davem@davemloft.net> CC: Steven Rostedt <rostedt@goodmis.org> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Ingo Molnar <mingo@elte.hu> CC: Thomas Gleixner <tglx@linutronix.de> CC: Zhaolei <zhaolei@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-01-07 20:55:42 -05:00
Steven Rostedt	3a9f987b31	tracing: Include module.h in define_trace.h While doing some developing, Peter Zijlstra and I have found that if a CREATE_TRACE_POINTS include is done before module.h is included, it can break the build. We have been lucky so far that this has not broke the build since module.h is included in almost everything. Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-01-07 15:44:56 -05:00
Jeff Moyer	b7908c1035	block: trace event block fix unassigned field The "error" field in block_bio_complete is not assigned, leaving the memory area uninitialized (keeping garbage data). Pass an additional tracepoint argument to this event to initialize this field. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Ingo Molnar <mingo@elte.hu> CC: Thomas Gleixner <tglx@linutronix.de> CC: Li Zefan <lizf@cn.fujitsu.com> CC: Alan.Brunelle@hp.com Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-01-07 08:43:55 +01:00
Thomas Renninger	25e41933b5	perf: Clean up power events by introducing new, more generic ones Add these new power trace events: power:cpu_idle power:cpu_frequency power:machine_suspend The old C-state/idle accounting events: power:power_start power:power_end Have now a replacement (but we are still keeping the old tracepoints for compatibility): power:cpu_idle and power:power_frequency is replaced with: power:cpu_frequency power:machine_suspend is newly introduced. Jean Pihet has a patch integrated into the generic layer (kernel/power/suspend.c) which will make use of it. the type= field got removed from both, it was never used and the type is differed by the event type itself. perf timechart userspace tool gets adjusted in a separate patch. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Jean Pihet <jean.pihet@newoldbits.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: rjw@sisk.pl LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>	2011-01-04 08:16:54 +01:00
Mark Brown	3028eb8c51	ASoC: Add trace events for jack detection As jack detection can trigger DAPM and the latency in debouncing can create confusing windows in operation provide some trace events which will hopefully help in diagnostics. The soc-jack core traces all reports that it gets and the resulting notifications to upper layers. An event for jack IRQs is also provided for instrumentation of debounce, and used in the GPIO jack code. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: Liam Girdwood <lrg@slimlogic.co.uk>	2010-12-06 14:13:42 +00:00
Steven Rostedt	287050d390	tracing: Add TRACE_EVENT_CONDITIONAL() There are instances in the kernel that we only want to trace a tracepoint when a certain condition is set. But we do not want to test for that condition in the core kernel. If we test for that condition before calling the tracepoin, then we will be performing that test even when tracing is not enabled. This is 99.99% of the time. We currently can just filter out on that condition, but that happens after we write to the trace buffer. We just wasted time writing to the ring buffer for an event we never cared about. This patch adds: TRACE_EVENT_CONDITION() and DEFINE_EVENT_CONDITION() These have a new TP_CONDITION() argument that comes right after the TP_ARGS(). This condition can use the parameters of TP_ARGS() in the TRACE_EVENT() to determine if the tracepoint should be traced or not. The TP_CONDITION() will be placed in a if (cond) trace; For example, for the tracepoint sched_wakeup, it is useless to trace a wakeup event where the caller never actually wakes anything up (where success == 0). So adding: TP_CONDITION(success), which uses the "success" parameter of the wakeup tracepoint will have it only trace when we have successfully woken up a task. Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-12-03 10:45:34 -05:00
Steven Rostedt	0429578016	tracing/events: Show real number in array fields Currently we have in something like the sched_switch event: field:char prev_comm[TASK_COMM_LEN]; offset:12; size:16; signed:1; When a userspace tool such as perf tries to parse this, the TASK_COMM_LEN is meaningless. This is done because the TRACE_EVENT() macro simply uses a #len to show the string of the length. When the length is an enum, we get a string that means nothing for tools. By adding a static buffer and a mutex to protect it, we can store the string into that buffer with snprintf and show the actual number. Now we get: field:char prev_comm[16]; offset:12; size:16; signed:1; Something much more useful. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-11-19 10:18:47 -05:00
Mark Brown	9e18e1869f	Merge branch 'next-samsung-asoc' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into for-2.6.38	2010-11-19 11:49:35 +00:00
Ingo Molnar	ae51ce9061	Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core	2010-11-18 20:07:12 +01:00
Frederic Weisbecker	53cf810b19	tracing: Allow syscall trace events for non privileged users As for the raw syscalls events, individual syscall events won't leak system wide information on task bound tracing. Allow non privileged users to use them in such workflow. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Jason Baron <jbaron@redhat.com>	2010-11-18 14:37:44 +01:00
Frederic Weisbecker	fe5542030d	tracing: Allow raw syscall trace events for non privileged users This allows non privileged users to use the raw syscall trace events for task bound tracing in perf. It is safe because raw syscall trace events don't leak system wide informations. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Jason Baron <jbaron@redhat.com>	2010-11-18 14:37:43 +01:00
Frederic Weisbecker	1ed0c59711	tracing: New macro to set up initial event flags value This introduces the new TRACE_EVENT_FLAGS() macro in order to set up initial event flags value. This macro must simply follow the definition of a trace event and take the event name and the flag value as parameters: TRACE_EVENT(my_event, ..... .... ); TRACE_EVENT_FLAGS(my_event, 1) This will set up 1 as the initial my_event->flags value. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Jason Baron <jbaron@redhat.com>	2010-11-18 14:37:42 +01:00
Mike Snitzer	d07335e51d	block: Rename "block_remap" tracepoint to "block_bio_remap" to clarify the event. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-11-16 12:53:39 +01:00
Mark Brown	84e909303d	ASoC: Add DAPM trace events Trace events for DAPM allow us to monitor the performance and behaviour of DAPM with logging which can be built into the kernel permanantly, is more suited to automated analysis and display and less likely to suffer interference from other logging activity. Currently trace events are generated for: - Start and stop of DAPM processing - Start and stop of bias level changes - Power decisions for widgets - Widget event execution start and stop giving some view as to what is happening and where latencies occur. Actual changes in widget power can be seen via the register write trace in soc-core. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: Liam Girdwood <lrg@slimlogic.co.uk>	2010-11-11 14:54:31 +00:00
Mark Brown	a8b1d34f3e	ASoC: Add trace events for ASoC register read/write The trace subsystem provides a convenient way of instrumenting the kernel which can be left on all the time with extremely low impact on the system unlike prints to the kernel log which can be very spammy. Begin adding support for instrumenting ASoC via this interface by adding trace for the register access primitives. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: Liam Girdwood <lrg@slimlogic.co.uk>	2010-11-11 14:54:26 +00:00
Theodore Ts'o	7ff9c073dd	ext4: Add new ext4 inode tracepoints Add ext4_evict_inode, ext4_drop_inode, ext4_mark_inode_dirty, and ext4_begin_ordered_truncate() Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-11-08 13:51:33 -05:00
Theodore Ts'o	a107e5a3a4	Merge branch 'next' into upstream-merge Conflicts: fs/ext4/inode.c fs/ext4/mballoc.c include/trace/events/ext4.h	2010-10-27 23:44:47 -04:00
Theodore Ts'o	a269029d0e	ext4,jbd2: convert tracepoints to use major/minor numbers Unfortunately perf can't deal with anything other than direct structure accesses in the TP_printk() section. It will drop dead when it sees jbd2_dev_to_name() in the "print fmt" section of the tracepoint. Addresses-Google-Bug: 3138508 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 22:08:50 -04:00
Linus Torvalds	a042e26137	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits) perf python scripting: Add futex-contention script perf python scripting: Fixup cut'n'paste error in sctop script perf scripting: Shut up 'perf record' final status perf record: Remove newline character from perror() argument perf python scripting: Support fedora 11 (audit 1.7.17) perf python scripting: Improve the syscalls-by-pid script perf python scripting: print the syscall name on sctop perf python scripting: Improve the syscalls-counts script perf python scripting: Improve the failed-syscalls-by-pid script kprobes: Remove redundant text_mutex lock in optimize x86/oprofile: Fix uninitialized variable use in debug printk tracing: Fix 'faild' -> 'failed' typo perf probe: Fix format specified for Dwarf_Off parameter perf trace: Fix detection of script extension perf trace: Use $PERF_EXEC_PATH in canned report scripts perf tools: Document event modifiers perf tools: Remove direct slang.h include perf_events: Fix for transaction recovery in group_sched_in() perf_events: Revert: Fix transaction recovery in group_sched_in() perf, x86: Use NUMA aware allocations for PEBS/BTS/DS allocations ...	2010-10-27 18:48:00 -07:00
Eric Sandeen	3e1e5f5016	ext4: don't use ext4_allocation_contexts for tracing Many tracepoints were populating an ext4_allocation_context to pass in, but this requires a slab allocation even when tracepoints are off. In fact, 4 of 5 of these allocations were only for tracing. In addition, we were only using a small fraction of the 144 bytes of this structure for this purpose. We can do away with all these alloc/frees of the ac and simply pass in the bits we care about, instead. I tested this by turning on tracing and running through xfstests on x86_64. I did not actually do anything with the trace output, however. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:07 -04:00
Eric Sandeen	4d5476164a	ext4: fix oops in trace_ext4_mb_release_group_pa Our QA reported an oops in the ext4_mb_release_group_pa tracing, and Josef Bacik pointed out that it was because we may have a non-null but uninitialized ac_inode in the allocation context. I can reproduce it when running xfstests with ext4 tracepoints on, on a CONFIG_SLAB_DEBUG kernel. We call trace_ext4_mb_release_group_pa from 2 places, ext4_mb_discard_group_preallocations and ext4_mb_discard_lg_preallocations In both cases we allocate an ac as a container just for tracing (!) and never fill in the ac_inode. There's no reason to be assigning, testing, or printing it as far as I can see, so just remove it from the tracepoint. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:07 -04:00
Wen Congyang	b853fd3648	ext4: avoid null dereference in trace_ext4_mballoc_discard ac->inode is set to null in function ext4_mb_release_group_pa(), and then trace_ext4_mballoc_discard(ac) is called, the kernel will panic. BUG: unable to handle kernel NULL pointer dereference at 000000a4 IP: [<f87e1714>] ftrace_raw_event_ext4__mballoc+0x54/0xc0 [ext4] pdpt = 0000000000abd001 pde = 0000000000000000 Oops: 0000 [#1] SMP Pid: 550, comm: flush-8:16 Not tainted 2.6.36-rc1 #1 SE7320EP2/Altos G530 EIP: 0060:[<f87e1714>] EFLAGS: 00010206 CPU: 1 EIP is at ftrace_raw_event_ext4__mballoc+0x54/0xc0 [ext4] EAX: f32ac840 EBX: f3f1cf88 ECX: f32ac840 EDX: 00000000 ESI: f32ac83c EDI: f880b9d8 EBP: 00000000 ESP: f4b77ae4 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process flush-8:16 (pid: 550, ti=f4b76000 task=f613e540 task.ti=f4b76000) Call Trace: [<f87f5ac1>] ? ext4_mb_release_group_pa+0x121/0x150 [ext4] [<f87f8356>] ? ext4_mb_discard_group_preallocations+0x336/0x400 [ext4] [<f87fb7f1>] ? ext4_mb_new_blocks+0x3d1/0x4f0 [ext4] [<c05a6c5b>] ? __make_request+0x10b/0x440 [<f87f1fb4>] ? ext4_ext_map_blocks+0x1334/0x1980 [ext4] [<c04ac78a>] ? rb_reserve_next_event+0xaa/0x3b0 [<f87d18d6>] ? ext4_map_blocks+0xd6/0x1d0 [ext4] [<f87d2da7>] ? mpage_da_map_blocks+0xc7/0x8a0 [ext4] [<c04c8a68>] ? find_get_pages_tag+0x38/0x110 [<c04d23a5>] ? __pagevec_release+0x15/0x20 [<f87d3ca5>] ? ext4_da_writepages+0x2b5/0x5d0 [ext4] [<c04cfbe0>] ? __writepage+0x0/0x30 [<c04d0e34>] ? do_writepages+0x14/0x30 [<c0526600>] ? writeback_single_inode+0xa0/0x240 [<c0526971>] ? writeback_sb_inodes+0xc1/0x180 [<c0526ab8>] ? writeback_inodes_wb+0x88/0x140 [<c0526d7b>] ? wb_writeback+0x20b/0x320 [<c045aca7>] ? lock_timer_base+0x27/0x50 [<c0526fe0>] ? wb_do_writeback+0x150/0x190 [<c05270a8>] ? bdi_writeback_thread+0x88/0x1f0 [<c043b680>] ? complete+0x40/0x60 [<c0527020>] ? bdi_writeback_thread+0x0/0x1f0 [<c0469474>] ? kthread+0x74/0x80 [<c0469400>] ? kthread+0x0/0x80 [<c040a23e>] ? kernel_thread_helper+0x6/0x10 Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:27:12 -04:00
Mel Gorman	0e093d9976	writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone If congestion_wait() is called with no BDI congested, the caller will sleep for the full timeout and this may be an unnecessary sleep. This patch adds a wait_iff_congested() that checks congestion and only sleeps if a BDI is congested else, it calls cond_resched() to ensure the caller is not hogging the CPU longer than its quota but otherwise will not sleep. This is aimed at reducing some of the major desktop stalls reported during IO. For example, while kswapd is operating, it calls congestion_wait() but it could just have been reclaiming clean page cache pages with no congestion. Without this patch, it would sleep for a full timeout but after this patch, it'll just call schedule() if it has been on the CPU too long. Similar logic applies to direct reclaimers that are not making enough progress. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:07 -07:00
KOSAKI Motohiro	7d3579e8e6	vmscan: narrow the scenarios in whcih lumpy reclaim uses synchrounous reclaim shrink_page_list() can decide to give up reclaiming a page under a number of conditions such as 1. trylock_page() failure 2. page is unevictable 3. zone reclaim and page is mapped 4. PageWriteback() is true 5. page is swapbacked and swap is full 6. add_to_swap() failure 7. page is dirty and gfpmask don't have GFP_IO, GFP_FS 8. page is pinned 9. IO queue is congested 10. pageout() start IO, but not finished With lumpy reclaim, failures result in entering synchronous lumpy reclaim but this can be unnecessary. In cases (2), (3), (5), (6), (7) and (8), there is no point retrying. This patch causes lumpy reclaim to abort when it is known it will fail. Case (9) is more interesting. current behavior is, 1. start shrink_page_list(async) 2. found queue_congested() 3. skip pageout write 4. still start shrink_page_list(sync) 5. wait on a lot of pages 6. again, found queue_congested() 7. give up pageout write again So, it's useless time wasting. However, just skipping page reclaim is also notgood as x86 allocating a huge page needs 512 pages for example. It can have more dirty pages than queue congestion threshold (~=128). After this patch, pageout() behaves as follows; - If order > PAGE_ALLOC_COSTLY_ORDER Ignore queue congestion always. - If order <= PAGE_ALLOC_COSTLY_ORDER skip write page and disable lumpy reclaim. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:07 -07:00
Mel Gorman	52bb919866	writeback: account for time spent congestion_waited There is strong evidence to indicate a lot of time is being spent in congestion_wait(), some of it unnecessarily. This patch adds a tracepoint for congestion_wait to record when congestion_wait() was called, how long the timeout was for and how long it actually slept. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:07 -07:00
Mel Gorman	e11da5b4fd	tracing, vmscan: add trace events for LRU list shrinking There have been numerous reports of stalls that pointed at the problem being somewhere in the VM. There are multiple roots to the problems which means dealing with any of the root problems in isolation is tricky to justify on their own and they would still need integration testing. This patch series puts together two different patch sets which in combination should tackle some of the root causes of latency problems being reported. Patch 1 adds a tracepoint for shrink_inactive_list. For this series, the most important results is being able to calculate the scanning/reclaim ratio as a measure of the amount of work being done by page reclaim. Patch 2 accounts for time spent in congestion_wait. Patches 3-6 were originally developed by Kosaki Motohiro but reworked for this series. It has been noted that lumpy reclaim is far too aggressive and trashes the system somewhat. As SLUB uses high-order allocations, a large cost incurred by lumpy reclaim will be noticeable. It was also reported during transparent hugepage support testing that lumpy reclaim was trashing the system and these patches should mitigate that problem without disabling lumpy reclaim. Patch 7 adds wait_iff_congested() and replaces some callers of congestion_wait(). wait_iff_congested() only sleeps if there is a BDI that is currently congested. Patch 8 notes that any BDI being congested is not necessarily a problem because there could be multiple BDIs of varying speeds and numberous zones. It attempts to track when a zone being reclaimed contains many pages backed by a congested BDI and if so, reclaimers wait on the congestion queue. I ran a number of tests with monitoring on X86, X86-64 and PPC64. Each machine had 3G of RAM and the CPUs were X86: Intel P4 2-core X86-64: AMD Phenom 4-core PPC64: PPC970MP Each used a single disk and the onboard IO controller. Dirty ratio was left at 20. I'm just going to report for X86-64 and PPC64 in a vague attempt to keep this report short. Four kernels were tested each based on v2.6.36-rc4 traceonly-v2r2: Patches 1 and 2 to instrument vmscan reclaims and congestion_wait lowlumpy-v2r3: Patches 1-6 to test if lumpy reclaim is better waitcongest-v2r3: Patches 1-7 to only wait on congestion waitwriteback-v2r4: Patches 1-8 to detect when a zone is congested nocongest-v1r5: Patches 1-3 for testing wait_iff_congestion nodirect-v1r5: Patches 1-10 to disable filesystem writeback for better IO The tests run were as follows kernbench compile-based benchmark. Smoke test performance sysbench OLTP read-only benchmark. Will be re-run in the future as read-write micro-mapped-file-stream This is a micro-benchmark from Johannes Weiner that accesses a large sparse-file through mmap(). It was configured to run in only single-CPU mode but can be indicative of how well page reclaim identifies suitable pages. stress-highalloc Tries to allocate huge pages under heavy load. kernbench, iozone and sysbench did not report any performance regression on any machine. sysbench did pressure the system lightly and there was reclaim activity but there were no difference of major interest between the kernels. X86-64 micro-mapped-file-stream traceonly-v2r2 lowlumpy-v2r3 waitcongest-v2r3 waitwriteback-v2r4 pgalloc_dma 1639.00 ( 0.00%) 667.00 (-145.73%) 1167.00 ( -40.45%) 578.00 (-183.56%) pgalloc_dma32 2842410.00 ( 0.00%) 2842626.00 ( 0.01%) 2843043.00 ( 0.02%) 2843014.00 ( 0.02%) pgalloc_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) pgsteal_dma 729.00 ( 0.00%) 85.00 (-757.65%) 609.00 ( -19.70%) 125.00 (-483.20%) pgsteal_dma32 2338721.00 ( 0.00%) 2447354.00 ( 4.44%) 2429536.00 ( 3.74%) 2436772.00 ( 4.02%) pgsteal_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) pgscan_kswapd_dma 1469.00 ( 0.00%) 532.00 (-176.13%) 1078.00 ( -36.27%) 220.00 (-567.73%) pgscan_kswapd_dma32 4597713.00 ( 0.00%) 4503597.00 ( -2.09%) 4295673.00 ( -7.03%) 3891686.00 ( -18.14%) pgscan_kswapd_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) pgscan_direct_dma 71.00 ( 0.00%) 134.00 ( 47.01%) 243.00 ( 70.78%) 352.00 ( 79.83%) pgscan_direct_dma32 305820.00 ( 0.00%) 280204.00 ( -9.14%) 600518.00 ( 49.07%) 957485.00 ( 68.06%) pgscan_direct_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) pageoutrun 16296.00 ( 0.00%) 21254.00 ( 23.33%) 18447.00 ( 11.66%) 20067.00 ( 18.79%) allocstall 443.00 ( 0.00%) 273.00 ( -62.27%) 513.00 ( 13.65%) 1568.00 ( 71.75%) These are based on the raw figures taken from /proc/vmstat. It's a rough measure of reclaim activity. Note that allocstall counts are higher because we are entering direct reclaim more often as a result of not sleeping in congestion. In itself, it's not necessarily a bad thing. It's easier to get a view of what happened from the vmscan tracepoint report. FTrace Reclaim Statistics: vmscan traceonly-v2r2 lowlumpy-v2r3 waitcongest-v2r3 waitwriteback-v2r4 Direct reclaims 443 273 513 1568 Direct reclaim pages scanned 305968 280402 600825 957933 Direct reclaim pages reclaimed 43503 19005 30327 117191 Direct reclaim write file async I/O 0 0 0 0 Direct reclaim write anon async I/O 0 3 4 12 Direct reclaim write file sync I/O 0 0 0 0 Direct reclaim write anon sync I/O 0 0 0 0 Wake kswapd requests 187649 132338 191695 267701 Kswapd wakeups 3 1 4 1 Kswapd pages scanned 4599269 4454162 4296815 3891906 Kswapd pages reclaimed 2295947 2428434 2399818 2319706 Kswapd reclaim write file async I/O 1 0 1 1 Kswapd reclaim write anon async I/O 59 187 41 222 Kswapd reclaim write file sync I/O 0 0 0 0 Kswapd reclaim write anon sync I/O 0 0 0 0 Time stalled direct reclaim (seconds) 4.34 2.52 6.63 2.96 Time kswapd awake (seconds) 11.15 10.25 11.01 10.19 Total pages scanned 4905237 4734564 4897640 4849839 Total pages reclaimed `2339450` 2447439 2430145 2436897 %age total pages scanned/reclaimed 47.69% 51.69% 49.62% 50.25% %age total pages scanned/written 0.00% 0.00% 0.00% 0.00% %age file pages scanned/written 0.00% 0.00% 0.00% 0.00% Percentage Time Spent Direct Reclaim 29.23% 19.02% 38.48% 20.25% Percentage Time kswapd Awake 78.58% 78.85% 76.83% 79.86% What is interesting here for nocongest in particular is that while direct reclaim scans more pages, the overall number of pages scanned remains the same and the ratio of pages scanned to pages reclaimed is more or less the same. In other words, while we are sleeping less, reclaim is not doing more work and as direct reclaim and kswapd is awake for less time, it would appear to be doing less work. FTrace Reclaim Statistics: congestion_wait Direct number congest waited 87 196 64 0 Direct time congest waited 4604ms 4732ms 5420ms 0ms Direct full congest waited 72 145 53 0 Direct number conditional waited 0 0 324 1315 Direct time conditional waited 0ms 0ms 0ms 0ms Direct full conditional waited 0 0 0 0 KSwapd number congest waited 20 10 15 7 KSwapd time congest waited 1264ms 536ms 884ms 284ms KSwapd full congest waited 10 4 6 2 KSwapd number conditional waited 0 0 0 0 KSwapd time conditional waited 0ms 0ms 0ms 0ms KSwapd full conditional waited 0 0 0 0 The vanilla kernel spent 8 seconds asleep in direct reclaim and no time at all asleep with the patches. MMTests Statistics: duration User/Sys Time Running Test (seconds) 10.51 10.73 10.6 11.66 Total Elapsed Time (seconds) 14.19 13.00 14.33 12.76 Overall, the tests completed faster. It is interesting to note that backing off further when a zone is congested and not just a BDI was more efficient overall. PPC64 micro-mapped-file-stream pgalloc_dma 3024660.00 ( 0.00%) 3027185.00 ( 0.08%) 3025845.00 ( 0.04%) 3026281.00 ( 0.05%) pgalloc_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) pgsteal_dma 2508073.00 ( 0.00%) 2565351.00 ( 2.23%) 2463577.00 ( -1.81%) 2532263.00 ( 0.96%) pgsteal_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) pgscan_kswapd_dma 4601307.00 ( 0.00%) 4128076.00 ( -11.46%) 3912317.00 ( -17.61%) 3377165.00 ( -36.25%) pgscan_kswapd_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) pgscan_direct_dma 629825.00 ( 0.00%) 971622.00 ( 35.18%) 1063938.00 ( 40.80%) 1711935.00 ( 63.21%) pgscan_direct_normal 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) pageoutrun 27776.00 ( 0.00%) 20458.00 ( -35.77%) 18763.00 ( -48.04%) 18157.00 ( -52.98%) allocstall 977.00 ( 0.00%) 2751.00 ( 64.49%) 2098.00 ( 53.43%) 5136.00 ( 80.98%) Similar trends to x86-64. allocstalls are up but it's not necessarily bad. FTrace Reclaim Statistics: vmscan Direct reclaims 977 2709 2098 5136 Direct reclaim pages scanned 629825 963814 1063938 1711935 Direct reclaim pages reclaimed 75550 242538 150904 387647 Direct reclaim write file async I/O 0 0 0 2 Direct reclaim write anon async I/O 0 10 0 4 Direct reclaim write file sync I/O 0 0 0 0 Direct reclaim write anon sync I/O 0 0 0 0 Wake kswapd requests 392119 1201712 571935 571921 Kswapd wakeups 3 2 3 3 Kswapd pages scanned 4601307 4128076 3912317 3377165 Kswapd pages reclaimed 2432523 2318797 2312673 2144616 Kswapd reclaim write file async I/O 20 1 1 1 Kswapd reclaim write anon async I/O 57 132 11 121 Kswapd reclaim write file sync I/O 0 0 0 0 Kswapd reclaim write anon sync I/O 0 0 0 0 Time stalled direct reclaim (seconds) 6.19 7.30 13.04 10.88 Time kswapd awake (seconds) 21.73 26.51 25.55 23.90 Total pages scanned 5231132 5091890 4976255 5089100 Total pages reclaimed 2508073 2561335 2463577 2532263 %age total pages scanned/reclaimed 47.95% 50.30% 49.51% 49.76% %age total pages scanned/written 0.00% 0.00% 0.00% 0.00% %age file pages scanned/written 0.00% 0.00% 0.00% 0.00% Percentage Time Spent Direct Reclaim 18.89% 20.65% 32.65% 27.65% Percentage Time kswapd Awake 72.39% 80.68% 78.21% 77.40% Again, a similar trend that the congestion_wait changes mean that direct reclaim scans more pages but the overall number of pages scanned while slightly reduced, are very similar. The ratio of scanning/reclaimed remains roughly similar. The downside is that kswapd and direct reclaim was awake longer and for a larger percentage of the overall workload. It's possible there were big differences in the amount of time spent reclaiming slab pages between the different kernels which is plausible considering that the micro tests runs after fsmark and sysbench. Trace Reclaim Statistics: congestion_wait Direct number congest waited 845 1312 104 0 Direct time congest waited 19416ms 26560ms 7544ms 0ms Direct full congest waited 745 1105 72 0 Direct number conditional waited 0 0 1322 2935 Direct time conditional waited 0ms 0ms 12ms 312ms Direct full conditional waited 0 0 0 3 KSwapd number congest waited 39 102 75 63 KSwapd time congest waited 2484ms 6760ms 5756ms 3716ms KSwapd full congest waited 20 48 46 25 KSwapd number conditional waited 0 0 0 0 KSwapd time conditional waited 0ms 0ms 0ms 0ms KSwapd full conditional waited 0 0 0 0 The vanilla kernel spent 20 seconds asleep in direct reclaim and only 312ms asleep with the patches. The time kswapd spent congest waited was also reduced by a large factor. MMTests Statistics: duration ser/Sys Time Running Test (seconds) 26.58 28.05 26.9 28.47 Total Elapsed Time (seconds) 30.02 32.86 32.67 30.88 With all patches applies, the completion times are very similar. X86-64 STRESS-HIGHALLOC traceonly-v2r2 lowlumpy-v2r3 waitcongest-v2r3waitwriteback-v2r4 Pass 1 82.00 ( 0.00%) 84.00 ( 2.00%) 85.00 ( 3.00%) 85.00 ( 3.00%) Pass 2 90.00 ( 0.00%) 87.00 (-3.00%) 88.00 (-2.00%) 89.00 (-1.00%) At Rest 92.00 ( 0.00%) 90.00 (-2.00%) 90.00 (-2.00%) 91.00 (-1.00%) Success figures across the board are broadly similar. traceonly-v2r2 lowlumpy-v2r3 waitcongest-v2r3waitwriteback-v2r4 Direct reclaims 1045 944 886 887 Direct reclaim pages scanned 135091 119604 109382 101019 Direct reclaim pages reclaimed 88599 47535 47863 46671 Direct reclaim write file async I/O 494 283 465 280 Direct reclaim write anon async I/O 29357 13710 16656 13462 Direct reclaim write file sync I/O 154 2 2 3 Direct reclaim write anon sync I/O 14594 571 509 561 Wake kswapd requests 7491 933 872 892 Kswapd wakeups 814 778 731 780 Kswapd pages scanned 7290822 15341158 11916436 13703442 Kswapd pages reclaimed 3587336 3142496 3094392 3187151 Kswapd reclaim write file async I/O 91975 32317 28022 29628 Kswapd reclaim write anon async I/O 1992022 789307 829745 849769 Kswapd reclaim write file sync I/O 0 0 0 0 Kswapd reclaim write anon sync I/O 0 0 0 0 Time stalled direct reclaim (seconds) 4588.93 2467.16 2495.41 2547.07 Time kswapd awake (seconds) 2497.66 1020.16 1098.06 1176.82 Total pages scanned 7425913 15460762 12025818 13804461 Total pages reclaimed 3675935 3190031 3142255 3233822 %age total pages scanned/reclaimed 49.50% 20.63% 26.13% 23.43% %age total pages scanned/written 28.66% 5.41% 7.28% 6.47% %age file pages scanned/written 1.25% 0.21% 0.24% 0.22% Percentage Time Spent Direct Reclaim 57.33% 42.15% 42.41% 42.99% Percentage Time kswapd Awake 43.56% 27.87% 29.76% 31.25% Scanned/reclaimed ratios again look good with big improvements in efficiency. The Scanned/written ratios also look much improved. With a better scanned/written ration, there is an expectation that IO would be more efficient and indeed, the time spent in direct reclaim is much reduced by the full series and kswapd spends a little less time awake. Overall, indications here are that allocations were happening much faster and this can be seen with a graph of the latency figures as the allocations were taking place http://www.csn.ul.ie/~mel/postings/vmscanreduce-20101509/highalloc-interlatency-hydra-mean.ps FTrace Reclaim Statistics: congestion_wait Direct number congest waited 1333 204 169 4 Direct time congest waited 78896ms 8288ms 7260ms 200ms Direct full congest waited 756 92 69 2 Direct number conditional waited 0 0 26 186 Direct time conditional waited 0ms 0ms 0ms 2504ms Direct full conditional waited 0 0 0 25 KSwapd number congest waited 4 395 227 282 KSwapd time congest waited 384ms 25136ms 10508ms 18380ms KSwapd full congest waited 3 232 98 176 KSwapd number conditional waited 0 0 0 0 KSwapd time conditional waited 0ms 0ms 0ms 0ms KSwapd full conditional waited 0 0 0 0 KSwapd full conditional waited 318 0 312 9 Overall, the time spent speeping is reduced. kswapd is still hitting congestion_wait() but that is because there are callers remaining where it wasn't clear in advance if they should be changed to wait_iff_congested() or not. Overall the sleep imes are reduced though - from 79ish seconds to about 19. MMTests Statistics: duration User/Sys Time Running Test (seconds) 3415.43 3386.65 3388.39 3377.5 Total Elapsed Time (seconds) 5733.48 3660.33 3689.41 3765.39 With the full series, the time to complete the tests are reduced by 30% PPC64 STRESS-HIGHALLOC traceonly-v2r2 lowlumpy-v2r3 waitcongest-v2r3waitwriteback-v2r4 Pass 1 17.00 ( 0.00%) 34.00 (17.00%) 38.00 (21.00%) 43.00 (26.00%) Pass 2 25.00 ( 0.00%) 37.00 (12.00%) 42.00 (17.00%) 46.00 (21.00%) At Rest 49.00 ( 0.00%) 43.00 (-6.00%) 45.00 (-4.00%) 51.00 ( 2.00%) Success rates there are way up particularly considering that the 16MB huge pages on PPC64 mean that it's always much harder to allocate them. FTrace Reclaim Statistics: vmscan stress-highalloc stress-highalloc stress-highalloc stress-highalloc traceonly-v2r2 lowlumpy-v2r3 waitcongest-v2r3waitwriteback-v2r4 Direct reclaims 499 505 564 509 Direct reclaim pages scanned 223478 41898 51818 45605 Direct reclaim pages reclaimed 137730 21148 27161 23455 Direct reclaim write file async I/O 399 136 162 136 Direct reclaim write anon async I/O 46977 2865 4686 3998 Direct reclaim write file sync I/O 29 0 1 3 Direct reclaim write anon sync I/O 31023 159 237 239 Wake kswapd requests 420 351 360 326 Kswapd wakeups 185 294 249 277 Kswapd pages scanned 15703488 16392500 17821724 17598737 Kswapd pages reclaimed 5808466 2908858 3139386 3145435 Kswapd reclaim write file async I/O 159938 18400 18717 13473 Kswapd reclaim write anon async I/O 3467554 228957 322799 234278 Kswapd reclaim write file sync I/O 0 0 0 0 Kswapd reclaim write anon sync I/O 0 0 0 0 Time stalled direct reclaim (seconds) 9665.35 1707.81 2374.32 1871.23 Time kswapd awake (seconds) 9401.21 1367.86 1951.75 1328.88 Total pages scanned 15926966 16434398 17873542 17644342 Total pages reclaimed 5946196 2930006 3166547 3168890 %age total pages scanned/reclaimed 37.33% 17.83% 17.72% 17.96% %age total pages scanned/written 23.27% 1.52% 1.94% 1.43% %age file pages scanned/written 1.01% 0.11% 0.11% 0.08% Percentage Time Spent Direct Reclaim 44.55% 35.10% 41.42% 36.91% Percentage Time kswapd Awake 86.71% 43.58% 52.67% 41.14% While the scanning rates are slightly up, the scanned/reclaimed and scanned/written figures are much improved. The time spent in direct reclaim and with kswapd are massively reduced, mostly by the lowlumpy patches. FTrace Reclaim Statistics: congestion_wait Direct number congest waited 725 303 126 3 Direct time congest waited 45524ms 9180ms 5936ms 300ms Direct full congest waited 487 190 52 3 Direct number conditional waited 0 0 200 301 Direct time conditional waited 0ms 0ms 0ms 1904ms Direct full conditional waited 0 0 0 19 KSwapd number congest waited 0 2 23 4 KSwapd time congest waited 0ms 200ms 420ms 404ms KSwapd full congest waited 0 2 2 4 KSwapd number conditional waited 0 0 0 0 KSwapd time conditional waited 0ms 0ms 0ms 0ms KSwapd full conditional waited 0 0 0 0 Not as dramatic a story here but the time spent asleep is reduced and we can still see what wait_iff_congested is going to sleep when necessary. MMTests Statistics: duration User/Sys Time Running Test (seconds) 12028.09 3157.17 3357.79 3199.16 Total Elapsed Time (seconds) 10842.07 3138.72 3705.54 3229.85 The time to complete this test goes way down. With the full series, we are allocating over twice the number of huge pages in 30% of the time and there is a corresponding impact on the allocation latency graph available at. http://www.csn.ul.ie/~mel/postings/vmscanreduce-20101509/highalloc-interlatency-powyah-mean.ps This patch: Add a trace event for shrink_inactive_list() and updates the sample postprocessing script appropriately. It can be used to determine how many pages were reclaimed and for non-lumpy reclaim where exactly the pages were reclaimed from. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:07 -07:00
Wu Fengguang	1b430beee5	writeback: remove nonblocking/encountered_congestion references This removes more dead code that was somehow missed by commit `0d99519efe` (writeback: remove unused nonblocking and congestion checks). There are no behavior change except for the removal of two entries from one of the ext4 tracing interface. The nonblocking checks in ->writepages are no longer used because the flusher now prefer to block on get_request_wait() than to skip inodes on IO congestion. The latter will lead to more seeky IO. The nonblocking checks in ->writepage are no longer used because it's redundant with the WB_SYNC_NONE check. We no long set ->nonblocking in VM page out and page migration, because a) it's effectively redundant with WB_SYNC_NONE in current code b) it's old semantic of "Don't get stuck on request queues" is mis-behavior: that would skip some dirty inodes on congestion and page out others, which is unfair in terms of LRU age. Inspired by Christoph Hellwig. Thanks! Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: David Howells <dhowells@redhat.com> Cc: Sage Weil <sage@newdream.net> Cc: Steve French <sfrench@samba.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:05 -07:00
Ingo Molnar	b8ecad8b2f	Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/urgent	2010-10-23 20:05:43 +02:00
Linus Torvalds	91b745016c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: remove in_workqueue_context() workqueue: Clarify that schedule_on_each_cpu is synchronous memory_hotplug: drop spurious calls to flush_scheduled_work() shpchp: update workqueue usage pciehp: update workqueue usage isdn/eicon: don't call flush_scheduled_work() from diva_os_remove_soft_isr() workqueue: add and use WQ_MEM_RECLAIM flag workqueue: fix HIGHPRI handling in keep_working() workqueue: add queue_work and activate_work trace points workqueue: prepare for more tracepoints workqueue: implement flush[_delayed]_work_sync() workqueue: factor out start_flush_work() workqueue: cleanup flush/cancel functions workqueue: implement alloc_ordered_workqueue() Fix up trivial conflict in fs/gfs2/main.c as per Tejun	2010-10-22 17:13:10 -07:00
Linus Torvalds	bc4016f481	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (29 commits) sched: Export account_system_vtime() sched: Call tick_check_idle before __irq_enter sched: Remove irq time from available CPU power sched: Do not account irq time to current task x86: Add IRQ_TIME_ACCOUNTING sched: Add IRQ_TIME_ACCOUNTING, finer accounting of irq time sched: Add a PF flag for ksoftirqd identification sched: Consolidate account_system_vtime extern declaration sched: Fix softirq time accounting sched: Drop group_capacity to 1 only if local group has extra capacity sched: Force balancing on newidle balance if local group has capacity sched: Set group_imb only a task can be pulled from the busiest cpu sched: Do not consider SCHED_IDLE tasks to be cache hot sched: Drop all load weight manipulation for RT tasks sched: Create special class for stop/migrate work sched: Unindent labels sched: Comment updates: fix default latency and granularity numbers tracing/sched: Add sched_pi_setprio tracepoint sched: Give CPU bound RT tasks preference sched: Try not to migrate higher priority RT tasks ...	2010-10-21 12:55:43 -07:00
Thomas Gleixner	f4bc6bb2d5	tracing: Cleanup the convoluted softirq tracepoints With the addition of trace_softirq_raise() the softirq tracepoint got even more convoluted. Why the tracepoints take two pointers to assign an integer is beyond my comprehension. But adding an extra case which treats the first pointer as an unsigned long when the second pointer is NULL including the back and forth type casting is just horrible. Convert the softirq tracepoints to take a single unsigned int argument for the softirq vector number and fix the call sites. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <alpine.LFD.2.00.1010191428560.6815@localhost6.localdomain6> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: mathieu.desnoyers@efficios.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org>	2010-10-21 16:50:29 +02:00
Tejun Heo	cdadf0097c	workqueue: add queue_work and activate_work trace points These two tracepoints allow tracking when and how a work is queued and activated. This patch is based on Frederic's patch to add queue_work trace point. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com>	2010-10-05 10:49:55 +02:00
Tejun Heo	97bd234701	workqueue: prepare for more tracepoints Define workqueue_work event class and use it for workqueue_execute_end trace point. Also, move trace/events/workqueue.h include downwards such that all struct definitions are visible to it. This is to prepare for more tracepoints and doesn't cause any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com>	2010-10-05 10:41:14 +02:00
Steven Rostedt	a8027073eb	tracing/sched: Add sched_pi_setprio tracepoint Add a tracepoint that shows the priority of a task being boosted via priority inheritance. Cc: Gregory Haskins <ghaskins@novell.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-09-21 10:56:41 -04:00
Jean Pihet	74704ac6ea	tracing, perf: Add more power related events This patch adds new generic events for dynamic power management tracing: - clock events class: used for clock enable/disable and for clock rate change, - power_domain events class: used for power domains transitions. The OMAP architecture will be using the new events for PM debugging, however the new events are made generic enough to be used by all platforms. Signed-off-by: Jean Pihet <j-pihet@ti.com> Acked-by: Thomas Renninger <trenn@suse.de> Cc: discuss@lesswatts.org Cc: linux-pm@lists.linux-foundation.org Cc: Thomas Renninger <trenn@suse.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Cc: Kevin Hilman <khilman@deeprootsystems.com> LKML-Reference: <AANLkTinUmbSUUuxUzc8++pcb9gd1CZFdyTQFrveTBXyV@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-17 09:10:43 +02:00

1 2 3 4 5 ...

496 Commits