Commit Graph

33 Commits

Author SHA1 Message Date
Theodore Ts'o
7ff9c073dd ext4: Add new ext4 inode tracepoints
Add ext4_evict_inode, ext4_drop_inode, ext4_mark_inode_dirty, and
ext4_begin_ordered_truncate()

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-11-08 13:51:33 -05:00
Theodore Ts'o
a107e5a3a4 Merge branch 'next' into upstream-merge
Conflicts:
	fs/ext4/inode.c
	fs/ext4/mballoc.c
	include/trace/events/ext4.h
2010-10-27 23:44:47 -04:00
Theodore Ts'o
a269029d0e ext4,jbd2: convert tracepoints to use major/minor numbers
Unfortunately perf can't deal with anything other than direct structure
accesses in the TP_printk() section.  It will drop dead when it sees
jbd2_dev_to_name() in the "print fmt" section of the tracepoint.

Addresses-Google-Bug: 3138508

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-27 22:08:50 -04:00
Eric Sandeen
3e1e5f5016 ext4: don't use ext4_allocation_contexts for tracing
Many tracepoints were populating an ext4_allocation_context
to pass in, but this requires a slab allocation even when
tracepoints are off.  In fact, 4 of 5 of these allocations
were only for tracing.  In addition, we were only using a
small fraction of the 144 bytes of this structure for this
purpose.

We can do away with all these alloc/frees of the ac and
simply pass in the bits we care about, instead.

I tested this by turning on tracing and running through
xfstests on x86_64.  I did not actually do anything with
the trace output, however.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-27 21:30:07 -04:00
Eric Sandeen
4d5476164a ext4: fix oops in trace_ext4_mb_release_group_pa
Our QA reported an oops in the ext4_mb_release_group_pa tracing,
and Josef Bacik pointed out that it was because we may have a
non-null but uninitialized ac_inode in the allocation context.

I can reproduce it when running xfstests with ext4 tracepoints on, 
on a CONFIG_SLAB_DEBUG kernel.

We call trace_ext4_mb_release_group_pa from 2 places, 
ext4_mb_discard_group_preallocations and 
ext4_mb_discard_lg_preallocations

In both cases we allocate an ac as a container just for tracing (!)
and never fill in the ac_inode.  There's no reason to be assigning,
testing, or printing it as far as I can see, so just remove it from
the tracepoint.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-27 21:30:07 -04:00
Wen Congyang
b853fd3648 ext4: avoid null dereference in trace_ext4_mballoc_discard
ac->inode is set to null in function ext4_mb_release_group_pa(),
and then trace_ext4_mballoc_discard(ac) is called, the kernel
will panic.

BUG: unable to handle kernel NULL pointer dereference at 000000a4
IP: [<f87e1714>] ftrace_raw_event_ext4__mballoc+0x54/0xc0 [ext4]
*pdpt = 0000000000abd001 *pde = 0000000000000000
Oops: 0000 [#1] SMP

Pid: 550, comm: flush-8:16 Not tainted 2.6.36-rc1 #1 SE7320EP2/Altos G530
EIP: 0060:[<f87e1714>] EFLAGS: 00010206 CPU: 1
EIP is at ftrace_raw_event_ext4__mballoc+0x54/0xc0 [ext4]
EAX: f32ac840 EBX: f3f1cf88 ECX: f32ac840 EDX: 00000000
ESI: f32ac83c EDI: f880b9d8 EBP: 00000000 ESP: f4b77ae4
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process flush-8:16 (pid: 550, ti=f4b76000 task=f613e540 task.ti=f4b76000)
Call Trace:
 [<f87f5ac1>] ? ext4_mb_release_group_pa+0x121/0x150 [ext4]
 [<f87f8356>] ? ext4_mb_discard_group_preallocations+0x336/0x400 [ext4]
 [<f87fb7f1>] ? ext4_mb_new_blocks+0x3d1/0x4f0 [ext4]
 [<c05a6c5b>] ? __make_request+0x10b/0x440
 [<f87f1fb4>] ? ext4_ext_map_blocks+0x1334/0x1980 [ext4]
 [<c04ac78a>] ? rb_reserve_next_event+0xaa/0x3b0
 [<f87d18d6>] ? ext4_map_blocks+0xd6/0x1d0 [ext4]
 [<f87d2da7>] ? mpage_da_map_blocks+0xc7/0x8a0 [ext4]
 [<c04c8a68>] ? find_get_pages_tag+0x38/0x110
 [<c04d23a5>] ? __pagevec_release+0x15/0x20
 [<f87d3ca5>] ? ext4_da_writepages+0x2b5/0x5d0 [ext4]
 [<c04cfbe0>] ? __writepage+0x0/0x30
 [<c04d0e34>] ? do_writepages+0x14/0x30
 [<c0526600>] ? writeback_single_inode+0xa0/0x240
 [<c0526971>] ? writeback_sb_inodes+0xc1/0x180
 [<c0526ab8>] ? writeback_inodes_wb+0x88/0x140
 [<c0526d7b>] ? wb_writeback+0x20b/0x320
 [<c045aca7>] ? lock_timer_base+0x27/0x50
 [<c0526fe0>] ? wb_do_writeback+0x150/0x190
 [<c05270a8>] ? bdi_writeback_thread+0x88/0x1f0
 [<c043b680>] ? complete+0x40/0x60
 [<c0527020>] ? bdi_writeback_thread+0x0/0x1f0
 [<c0469474>] ? kthread+0x74/0x80
 [<c0469400>] ? kthread+0x0/0x80
 [<c040a23e>] ? kernel_thread_helper+0x6/0x10

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-27 21:27:12 -04:00
Wu Fengguang
1b430beee5 writeback: remove nonblocking/encountered_congestion references
This removes more dead code that was somehow missed by commit 0d99519efe
(writeback: remove unused nonblocking and congestion checks).  There are
no behavior change except for the removal of two entries from one of the
ext4 tracing interface.

The nonblocking checks in ->writepages are no longer used because the
flusher now prefer to block on get_request_wait() than to skip inodes on
IO congestion.  The latter will lead to more seeky IO.

The nonblocking checks in ->writepage are no longer used because it's
redundant with the WB_SYNC_NONE check.

We no long set ->nonblocking in VM page out and page migration, because
a) it's effectively redundant with WB_SYNC_NONE in current code
b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
   that would skip some dirty inodes on congestion and page out others, which
   is unfair in terms of LRU age.

Inspired by Christoph Hellwig. Thanks!

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: David Howells <dhowells@redhat.com>
Cc: Sage Weil <sage@newdream.net>
Cc: Steve French <sfrench@samba.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:05 -07:00
Theodore Ts'o
e5880d76ae ext4: fix potential NULL dereference while tracing
The allocation_context pointer can be NULL.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-07-27 11:56:04 -04:00
Dave Chinner
0b5649278e writeback: pay attention to wbc->nr_to_write in write_cache_pages
If a filesystem writes more than one page in ->writepage, write_cache_pages
fails to notice this and continues to attempt writeback when wbc->nr_to_write
has gone negative - this trace was captured from XFS:

    wbc_writeback_start: towrt=1024
    wbc_writepage: towrt=1024
    wbc_writepage: towrt=0
    wbc_writepage: towrt=-1
    wbc_writepage: towrt=-5
    wbc_writepage: towrt=-21
    wbc_writepage: towrt=-85

This has adverse effects on filesystem writeback behaviour. write_cache_pages()
needs to terminate after a certain number of pages are written, not after a
certain number of calls to ->writepage are made.  This is a regression
introduced by 17bc6c30cf ("vfs: Add
no_nrwrite_index_update writeback control flag"), but cannot be reverted
directly due to subsequent bug fixes that have gone in on top of it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-08 18:12:44 -07:00
Christoph Hellwig
7ea8085910 drop unused dentry argument to ->fsync
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:05:02 -04:00
Li Zefan
f084db932e tracing: Convert more ext4 events to DEFINE_EVENT
Use DECLARE_EVENT_CLASS, and save ~2.7K:

   text    data     bss     dec     hex filename
 274441    7200     260  281901   44d2d fs/ext4/ext4.o.orig
 271881    7040     256  279177   44289 fs/ext4/ext4.o

4 events are converted:

  ext4__mb_new_pa: ext4_mb_new_inode_pa, ext4_mb_new_group_pa
  ext4__mballoc:   ext4_mballoc_discard, ext4_mballoc_free

No change in functionality.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-05-17 04:00:00 -04:00
Theodore Ts'o
f307333e14 ext4: Add new tracepoints to track mballoc's buddy bitmap loads
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-05-17 03:00:00 -04:00
Theodore Ts'o
f8ec9d6837 ext4: Add new tracepoints to debug delayed allocation space functions
Add tracepoints for ext4_da_reserve_space(),
ext4_da_update_reserve_space(), and ext4_da_release_space().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-01-01 01:00:21 -05:00
Linus Torvalds
4515c3069d Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (47 commits)
  ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem)
  ext4: Do not override ext2 or ext3 if built they are built as modules
  jbd2: Export jbd2_log_start_commit to fix ext4 build
  ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT
  ext4: Wait for proper transaction commit on fsync
  ext4: fix incorrect block reservation on quota transfer.
  ext4: quota macros cleanup
  ext4: ext4_get_reserved_space() must return bytes instead of blocks
  ext4: remove blocks from inode prealloc list on failure
  ext4: wait for log to commit when umounting
  ext4: Avoid data / filesystem corruption when write fails to copy data
  ext4: Use ext4 file system driver for ext2/ext3 file system mounts
  ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks()
  jbd2: Add ENOMEM checking in and for jbd2_journal_write_metadata_buffer()
  ext4: remove unused parameter wbc from __ext4_journalled_writepage()
  ext4: remove encountered_congestion trace
  ext4: move_extent_per_page() cleanup
  ext4: initialize moved_len before calling ext4_move_extents()
  ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT
  ext4: use ext4_data_block_valid() in ext4_free_blocks()
  ...
2009-12-10 09:33:29 -08:00
Li Zefan
b5eb34c359 tracing: Convert some ext4 events to DEFINE_TRACE
Use DECLARE_EVENT_CLASS to remove duplicate code:

   text    data     bss     dec     hex filename
 294695    6104     340  301139   49853 fs/ext4/ext4.o.old
 289983    6104     324  296411   485db fs/ext4/ext4.o

5 events are convertd:

  ext4__write_begin: ext4_write_begin, ext4_da_write_begin
  ext4__write_end: ext4_{ordered, writeback, journalled}_write_end

No change in functionality.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B0E2938.2040708@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26 09:14:05 +01:00
Wu Fengguang
b4d7241596 ext4: remove encountered_congestion trace
It is no longer set and scheduled to be removed.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-24 11:15:08 -05:00
Theodore Ts'o
6eebee6255 ext4: print i_mode in octal in ext4 tracepoints
Inode permissions are much easier to understand if they are printed in
octal.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-22 20:23:31 -05:00
Theodore Ts'o
e6362609b6 ext4: call ext4_forget() from ext4_free_blocks()
Add the facility for ext4_forget() to be called from
ext4_free_blocks().  This simplifies the code in a large number of
places, and centralizes most of the work of calling ext4_forget() into
a single place.

Also fix a bug in the extents migration code; it wasn't calling
ext4_forget() when releasing the indirect blocks during the
conversion.  As a result, if the system cashed during or shortly after
the extents migration, and the released indirect blocks get reused as
data blocks, the journal replay would corrupt the data blocks.  With
this new patch, fixing this bug was as simple as adding the
EXT4_FREE_BLOCKS_FORGET flags to the call to ext4_free_blocks().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
2009-11-23 07:17:05 -05:00
Theodore Ts'o
beac2da756 ext4: add tracepoint for ext4_forget()
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-23 07:25:08 -05:00
Josh Stone
0ef1224940 ext4: Add a stub for mpage_da_data in the trace header
The tracepoint ext4_da_write_pages has a struct mpage_da_data*
parameter, but that struct is only defined in fs/ext4/ext4.h.  This
patch adds a forward declaration for that struct, so this tracepoint
header can still be used by tools like SystemTap.

This is a continuation of the fix in commit 3661d286.

http://sourceware.org/bugzilla/show_bug.cgi?id=10703

Signed-off-by: Josh Stone <jistone@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-30 00:51:22 -04:00
Theodore Ts'o
296c355cd6 ext4: Use tracepoints for mb_history trace file
The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a
number of problems: it required a largish amount of memory to be
allocated for each ext4 filesystem, and the s_mb_history_lock
introduced a CPU contention problem.  

By ripping out the mb_history code and replacing it with ftrace
tracepoints, and we get more functionality: timestamps, event
filtering, the ability to correlate mballoc history with other ext4
tracepoints, etc.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-30 00:32:42 -04:00
Theodore Ts'o
55138e0bc2 ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks
Work around problems in the writeback code to force out writebacks in
larger chunks than just 4mb, which is just too small.  This also works
around limitations in the ext4 block allocator, which can't allocate
more than 2048 blocks at a time.  So we need to defeat the round-robin
characteristics of the writeback code and try to write out as many
blocks in one inode before allowing the writeback code to move on to
another inode.  We add a a new per-filesystem tunable,
max_writeback_mb_bump, which caps this to a default of 128mb per
inode.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-29 13:31:31 -04:00
Theodore Ts'o
fb40ba0d98 ext4: Add a tracepoint for ext4_alloc_da_blocks()
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-16 19:30:40 -04:00
Theodore Ts'o
3661d28615 ext4: Fix include/trace/events/ext4.h to work with Systemtap
Using relative pathnames in #include statements interacts badly with
SystemTap, since the fs/ext4/*.h header files are not packaged up as
part of a distribution kernel's header files.  Since systemtap doesn't
use TP_fast_assign(), we can use a blind structure definition and then
make sure the needed header files are defined before the ext4 source
files #include the trace/events/ext4.h header file.

https://bugzilla.redhat.com/show_bug.cgi?id=512478

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-14 22:59:50 -04:00
Theodore Ts'o
a3710fd1ee ext4: fix tracepoint format string warnings
Unlike on some other architectures ino_t is an unsigned int on s390.
So add an explicit cast to avoid lots of compile warnings:

In file included from include/trace/ftrace.h:285,
                 from include/trace/define_trace.h:61,
                 from include/trace/events/ext4.h:711,
                 from fs/ext4/super.c:50:
include/trace/events/ext4.h: In function 'ftrace_raw_output_ext4_free_inode':
include/trace/events/ext4.h:12: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'ino_t'

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-17 08:50:18 -04:00
Theodore Ts'o
b3a3ca8ca0 ext4: Add new tracepoint: trace_ext4_da_write_pages()
Add a new tracepoint which shows the pages that will be written using
write_cache_pages() by ext4_da_writepages().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-31 23:13:11 -04:00
Theodore Ts'o
98a56ab382 ext4: Fix spelling typo in the trace format for trace_ext4_da_writepages()
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-17 08:48:28 -04:00
Jens Axboe
1fe06ad892 writeback: get rid of wbc->for_writepages
It's only set, it's never checked. Kill it.

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16 15:16:18 +02:00
Linus Torvalds
c0c50b541a Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing/function-profiler: do not free per cpu variable stat
  tracing/events: Move TRACE_SYSTEM outside of include guard
2009-07-14 18:34:32 -07:00
Li Zefan
d0b6e04a4c tracing/events: Move TRACE_SYSTEM outside of include guard
If TRACE_INCLDUE_FILE is defined, <trace/events/TRACE_INCLUDE_FILE.h>
will be included and compiled, otherwise it will be
<trace/events/TRACE_SYSTEM.h>

So TRACE_SYSTEM should be defined outside of #if proctection,
just like TRACE_INCLUDE_FILE.

Imaging this scenario:

 #include <trace/events/foo.h>
    -> TRACE_SYSTEM == foo
 ...
 #include <trace/events/bar.h>
    -> TRACE_SYSTEM == bar
 ...
 #define CREATE_TRACE_POINTS
 #include <trace/events/foo.h>
    -> TRACE_SYSTEM == bar !!!

and then bar.h will be included and compiled.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4A5A9CF1.2010007@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-13 10:59:55 +02:00
Aneesh Kumar K.V
43ce1d23b4 ext4: Fix mmap/truncate race when blocksize < pagesize && !nodellaoc
This patch fixes the mmap/truncate race that was fixed for delayed
allocation by merging ext4_{journalled,normal,da}_writepage() into
ext4_writepage().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-14 17:58:45 -04:00
Theodore Ts'o
627ad9fd07 ext4: Fix type warning on 64-bit platforms in tracing events header
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-20 23:21:41 -04:00
Theodore Ts'o
9bffad1ed2 ext4: convert instrumentation from markers to tracepoints
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-17 11:48:11 -04:00