Commit Graph

186 Commits

Author SHA1 Message Date
Darrick J. Wong
7136851117 mm: make snapshotting pages for stable writes a per-bio operation
Walking a bio's page mappings has proved problematic, so create a new
bio flag to indicate that a bio's data needs to be snapshotted in order
to guarantee stable pages during writeback.  Next, for the one user
(ext3/jbd) of snapshotting, hook all the places where writes can be
initiated without PG_writeback set, and set BIO_SNAP_STABLE there.

We must also flag journal "metadata" bios for stable writeout, since
file data can be written through the journal.  Finally, the
MS_SNAP_STABLE mount flag (only used by ext3) is now superfluous, so get
rid of it.

[akpm@linux-foundation.org: rename _submit_bh()'s `flags' to `bio_flags', delobotomize the _submit_bh declaration]
[akpm@linux-foundation.org: teeny cleanup]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:33 -07:00
Eric Sandeen
7e2fb2d7e6 jbd: don't wake kjournald unnecessarily
Don't send an extra wakeup to kjournald in the case where we
already have the proper target in j_commit_request, i.e. that
commit has already been requested for commit.

commit d9b0193 "jbd: fix fsync() tid wraparound bug" changed
the logic leading to a wakeup, but it caused some extra wakeups
which were found to lead to a measurable performance regression.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-14 22:50:45 +01:00
Linus Torvalds
a2013a13e6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial branch from Jiri Kosina:
 "Usual stuff -- comment/printk typo fixes, documentation updates, dead
  code elimination."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  HOWTO: fix double words typo
  x86 mtrr: fix comment typo in mtrr_bp_init
  propagate name change to comments in kernel source
  doc: Update the name of profiling based on sysfs
  treewide: Fix typos in various drivers
  treewide: Fix typos in various Kconfig
  wireless: mwifiex: Fix typo in wireless/mwifiex driver
  messages: i2o: Fix typo in messages/i2o
  scripts/kernel-doc: check that non-void fcts describe their return value
  Kernel-doc: Convention: Use a "Return" section to describe return values
  radeon: Fix typo and copy/paste error in comments
  doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
  various: Fix spelling of "asynchronous" in comments.
  Fix misspellings of "whether" in comments.
  eisa: Fix spelling of "asynchronous".
  various: Fix spelling of "registered" in comments.
  doc: fix quite a few typos within Documentation
  target: iscsi: fix comment typos in target/iscsi drivers
  treewide: fix typo of "suport" in various comments and Kconfig
  treewide: fix typo of "suppport" in various comments
  ...
2012-12-13 12:00:02 -08:00
Jan Kara
25389bb207 jbd: Fix lock ordering bug in journal_unmap_buffer()
Commit 09e05d48 introduced a wait for transaction commit into
journal_unmap_buffer() in the case we are truncating a buffer undergoing commit
in the page stradding i_size on a filesystem with blocksize < pagesize. Sadly
we forgot to drop buffer lock before waiting for transaction commit and thus
deadlock is possible when kjournald wants to lock the buffer.

Fix the problem by dropping the buffer lock before waiting for transaction
commit. Since we are still holding page lock (and that is OK), buffer cannot
disappear under us.

CC: stable@vger.kernel.org # Wherever commit 09e05d48 was taken
Signed-off-by: Jan Kara <jack@suse.cz>
2012-11-23 15:17:18 +01:00
Adam Buchbinder
48fc7f7e78 Fix misspellings of "whether" in comments.
"Whether" is misspelled in various comments across the tree; this
fixes them. No code changes.

Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-11-19 14:31:35 +01:00
Jan Kara
09e05d4805 jbd: Fix assertion failure in commit code due to lacking transaction credits
ext3 users of data=journal mode with blocksize < pagesize were occasionally
hitting assertion failure in journal_commit_transaction() checking whether the
transaction has at least as many credits reserved as buffers attached.  The
core of the problem is that when a file gets truncated, buffers that still need
checkpointing or that are attached to the committing transaction are left with
buffer_mapped set. When this happens to buffers beyond i_size attached to a
page stradding i_size, subsequent write extending the file will see these
buffers and as they are mapped (but underlying blocks were freed) things go
awry from here.

The assertion failure just coincidentally (and in this case luckily as we would
start corrupting filesystem) triggers due to journal_head not being properly
cleaned up as well.

Under some rare circumstances this bug could even hit data=ordered mode users.
There the assertion won't trigger and we would end up corrupting the
filesystem.

We fix the problem by unmapping buffers if possible (in lots of cases we just
need a buffer attached to a transaction as a place holder but it must not be
written out anyway). And in one case, we just have to bite the bullet and wait
for transaction commit to finish.

Reviewed-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-09-12 15:52:03 +02:00
Jan Kara
2e84f2641e jbd: don't write superblock when unmounting an ro filesystem
This sequence:

results in an IO error when unmounting the RO filesystem. The bug was
introduced by:

commit 9754e39c7b
Author: Jan Kara <jack@suse.cz>
Date:   Sat Apr 7 12:33:03 2012 +0200

    jbd: Split updating of journal superblock and marking journal empty

which lost some of the magic in journal_update_superblock() which
used to test for a journal with no outstanding transactions.

This is a port of a jbd2 fix by Eric Sandeen.

CC: <stable@vger.kernel.org> # 3.4.x
Signed-off-by: Jan Kara <jack@suse.cz>
2012-08-15 13:53:30 +02:00
Artem Bityutskiy
12810ad708 jbd/jbd2: nuke write_super from comments
The '->write_super' superblock method is gone, and this patch removes all the
references to 'write_super' from various jbd and jbd2.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-08-04 12:15:36 +04:00
Jan Kara
349ecd6a3c jbd: Check return value of blkdev_issue_flush()
blkdev_issue_flush() can fail. Make sure the error gets properly propagated.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 23:38:36 +02:00
Jan Kara
fd2cbd4dfa jbd: Write journal superblock with WRITE_FUA after checkpointing
If journal superblock is written only in disk's caches and other transaction
starts reusing space of the transaction cleaned from the log, it can happen
blocks of a new transaction reach the disk before journal superblock. When
power failure happens in such case, subsequent journal replay would still try
to replay the old transaction but some of it's blocks may be already
overwritten by the new transaction. For this reason we must use WRITE_FUA when
updating log tail and we must first write new log tail to disk and update
in-memory information only after that.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:37 +02:00
Jan Kara
1ce8486dcc jbd: protect all log tail updates with j_checkpoint_mutex
There are some log tail updates that are not protected by j_checkpoint_mutex.
Some of these are harmless because they happen during startup or shutdown but
updates in journal_commit_transaction() and journal_flush() can really race
with other log tail updates (e.g. someone doing journal_flush() with someone
running cleanup_journal_tail()). So protect all log tail updates with
j_checkpoint_mutex.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:36 +02:00
Jan Kara
9754e39c7b jbd: Split updating of journal superblock and marking journal empty
There are three case of updating journal superblock. In the first case, we want
to mark journal as empty (setting s_sequence to 0), in the second case we want
to update log tail, in the third case we want to update s_errno. Split these
cases into separate functions. It makes the code slightly more straightforward
and later patches will make the distinction even more important.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:36 +02:00
Jan Kara
2db938bee3 jbd: Refine commit writeout logic
Currently we write out all journal buffers in WRITE_SYNC mode. This improves
performance for fsync heavy workloads but hinders performance when writes
are mostly asynchronous, most noticably it slows down readers and users
complain about slow desktop response etc.

So submit writes as asynchronous in the normal case and only submit writes as
WRITE_SYNC if we detect someone is waiting for current transaction commit.

I've gathered some numbers to back this change. The first is the read latency
test. It measures time to read 1 MB after several seconds of sleeping in
presence of streaming writes.

Top 10 times (out of 90) in us:
Before		After
2131586		697473
1709932		557487
1564598		535642
1480462		347573
1478579		323153
1408496		222181
1388960		181273
1329565		181070
1252486		172832
1223265		172278

Average:
619377		82180

So the improvement in both maximum and average latency is massive.

I've measured fsync throughput by:
fs_mark -n 100 -t 1 -s 16384 -d /mnt/fsync/ -S 1 -L 4

in presence of streaming reader. The numbers (fsyncs/s) are:
Before		After
9.9		6.3
6.8		6.0
6.3		6.2
5.8		6.1

So fsync performance seems unharmed by this change.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-04-11 11:12:44 +02:00
Linus Torvalds
c7c66c0cb0 Power management updates for 3.4
Assorted extensions and fixes including:
 
 * Introduction of early/late suspend/hibernation device callbacks.
 * Generic PM domains extensions and fixes.
 * devfreq updates from Axel Lin and MyungJoo Ham.
 * Device PM QoS updates.
 * Fixes of concurrency problems with wakeup sources.
 * System suspend and hibernation fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQIcBAABAgAGBQJPZww5AAoJEKhOf7ml8uNsiBYQAL9YGso7KypZhLspNxvAKuZr
 iHyme2F7OdOiUfo40DVH5tRuEsQvLOl0S+9ukWLrzQotKBsMfym05jtbGN9m6Ygh
 Z793sx3eRI3mltekJ9yrOxH6BOBDMWMkwY8ztU/X5aYDNirgJ/qtAjSK4BvWXBrz
 APeaUReVnLdaNP8SnhHfne/KPsHk++NKZvAAva7E6RwtZn4KV6bfiBPGb8yvY8pP
 m4cg1S5QEduMy+zQJ8+IlEHR91bt9spUyRwbhw6ZHCNzNeu4iEZT8DVt1O1sIRbO
 LsNcClqsd40nr781SoF8N9GmGUxlUDr46bS3FSsDkYzn8uyxGEsv00edJZtPwIm5
 7nPuYat3Ke1YsON0Kcd/wkBGXqw/Rjfp3F1bnHjpVx/0oM/6MPrFNnIwvpHspejG
 kN3770idYJ17dLckhcsbYsLdy8yirITILDzvHT0AAaZ9z4Lr9Pm56WwFZLyb/lhR
 2cqK8Bb8W9YvcVsKV8YqkyBVrygWMe+c56KoAoUBiSNxvW6LphmXFBj5QiFMs8s8
 Xh8H7xU96FKbpNMIAZ1+bpI4zgulQG4xPXI9pKbhMfjaMUgj2zQeO8/t0WlB1M0z
 +kEUcYHJnXrRrObQuHEFXZdIjy/E0fdUboMIrlLt0gm97OxnG6imPseQp6/leQkC
 t+L4Aq6TOUofUU86d4cI
 =IGhc
 -----END PGP SIGNATURE-----

Merge tag 'pm-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates for 3.4 from Rafael Wysocki:
 "Assorted extensions and fixes including:

  * Introduction of early/late suspend/hibernation device callbacks.
  * Generic PM domains extensions and fixes.
  * devfreq updates from Axel Lin and MyungJoo Ham.
  * Device PM QoS updates.
  * Fixes of concurrency problems with wakeup sources.
  * System suspend and hibernation fixes."

* tag 'pm-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (43 commits)
  PM / Domains: Check domain status during hibernation restore of devices
  PM / devfreq: add relation of recommended frequency.
  PM / shmobile: Make MTU2 driver use pm_genpd_dev_always_on()
  PM / shmobile: Make CMT driver use pm_genpd_dev_always_on()
  PM / shmobile: Make TMU driver use pm_genpd_dev_always_on()
  PM / Domains: Introduce "always on" device flag
  PM / Domains: Fix hibernation restore of devices, v2
  PM / Domains: Fix handling of wakeup devices during system resume
  sh_mmcif / PM: Use PM QoS latency constraint
  tmio_mmc / PM: Use PM QoS latency constraint
  PM / QoS: Make it possible to expose PM QoS latency constraints
  PM / Sleep: JBD and JBD2 missing set_freezable()
  PM / Domains: Fix include for PM_GENERIC_DOMAINS=n case
  PM / Freezer: Remove references to TIF_FREEZE in comments
  PM / Sleep: Add more wakeup source initialization routines
  PM / Hibernate: Enable usermodehelpers in hibernate() error path
  PM / Sleep: Make __pm_stay_awake() delete wakeup source timers
  PM / Sleep: Fix race conditions related to wakeup source timer function
  PM / Sleep: Fix possible infinite loop during wakeup source destruction
  PM / Hibernate: print physical addresses consistently with other parts of kernel
  ...
2012-03-21 10:15:51 -07:00
Cong Wang
8fb53c46d9 jbd: remove the second argument of k[un]map_atomic()
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:23 +08:00
Nigel Cunningham
35c80422af PM / Sleep: JBD and JBD2 missing set_freezable()
With the latest and greatest changes to the freezer, I started seeing
panics that were caused by jbd2 running post-process freezing and
hitting the canary BUG_ON for non-TuxOnIce I/O submission. I've traced
this back to a lack of set_freezable calls in both jbd and jbd2. Since
they're clearly meant to be frozen (there are tests for freezing()), I
submit the following patch to add the missing calls.

Signed-off-by: Nigel Cunningham <nigel@tuxonice.net>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-03-13 22:36:44 +01:00
Jan Kara
353b67d8ce jbd: Issue cache flush after checkpointing
When we reach cleanup_journal_tail(), there is no guarantee that
checkpointed buffers are on a stable storage - especially if buffers were
written out by log_do_checkpoint(), they are likely to be only in disk's
caches. Thus when we update journal superblock, effectively removing old
transaction from journal, this write of superblock can get to stable storage
before those checkpointed buffers which can result in filesystem corruption
after a crash.

A similar problem can happen if we replay the journal and wipe it before
flushing disk's caches.

Thus we must unconditionally issue a cache flush before we update journal
superblock in these cases. The fix is slightly complicated by the fact that we
have to get log tail before we issue cache flush but we can store it in the
journal superblock only after the cache flush. Otherwise we risk races where
new tail is written before appropriate cache flush is finished.

I managed to reproduce the corruption using somewhat tweaked Chris Mason's
barrier-test scheduler. Also this should fix occasional reports of 'Bit already
freed' filesystem errors which are totally unreproducible but inspection of
several fs images I've gathered over time points to a problem like this.

CC: stable@kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-11 13:36:57 +01:00
Linus Torvalds
ac69e09280 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2/3/4: delete unneeded includes of module.h
  ext{3,4}: Fix potential race when setversion ioctl updates inode
  udf: Mark LVID buffer as uptodate before marking it dirty
  ext3: Don't warn from writepage when readonly inode is spotted after error
  jbd: Remove j_barrier mutex
  reiserfs: Force inode evictions before umount to avoid crash
  reiserfs: Fix quota mount option parsing
  udf: Treat symlink component of type 2 as /
  udf: Fix deadlock when converting file from in-ICB one to normal one
  udf: Cleanup calling convention of inode_getblk()
  ext2: Fix error handling on inode bitmap corruption
  ext3: Fix error handling on inode bitmap corruption
  ext3: replace ll_rw_block with other functions
  ext3: NULL dereference in ext3_evict_inode()
  jbd: clear revoked flag on buffers before a new transaction started
  ext3: call ext3_mark_recovery_complete() when recovery is really needed
2012-01-09 12:51:21 -08:00
Jan Kara
0048278552 jbd: Remove j_barrier mutex
j_barrier mutex is used for serializing different journal lock operations.  The
problem with it is that e.g. FIFREEZE ioctl results in process leaving kernel
with j_barrier mutex held which makes lockdep freak out. Also hibernation code
wants to freeze filesystem but it cannot do so because it then cannot hibernate
the system because of mutex being locked.

So we remove j_barrier mutex and use direct wait on j_barrier_count instead.
Since locking journal is a rare operation we don't have to care about fairness
or such things.

CC: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:09 +01:00
Linus Torvalds
98793265b4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
  Kconfig: acpi: Fix typo in comment.
  misc latin1 to utf8 conversions
  devres: Fix a typo in devm_kfree comment
  btrfs: free-space-cache.c: remove extra semicolon.
  fat: Spelling s/obsolate/obsolete/g
  SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
  tools/power turbostat: update fields in manpage
  mac80211: drop spelling fix
  types.h: fix comment spelling for 'architectures'
  typo fixes: aera -> area, exntension -> extension
  devices.txt: Fix typo of 'VMware'.
  sis900: Fix enum typo 'sis900_rx_bufer_status'
  decompress_bunzip2: remove invalid vi modeline
  treewide: Fix comment and string typo 'bufer'
  hyper-v: Update MAINTAINERS
  treewide: Fix typos in various parts of the kernel, and fix some comments.
  clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
  gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
  leds: Kconfig: Fix typo 'D2NET_V2'
  sound: Kconfig: drop unknown symbol ARCH_CLPS7500
  ...

Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
kconfig additions, close to removed commented-out old ones)
2012-01-08 13:21:22 -08:00
Paul Bolle
90802ed9c3 treewide: Fix comment and string typo 'bufer'
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-12-06 09:53:40 +01:00
Yongqiang Yang
8c111b3f56 jbd: clear revoked flag on buffers before a new transaction started
Currently, we clear revoked flag only when a block is reused.  However,
this can tigger a false journal error.  Consider a situation when a block
is used as a meta block and is deleted(revoked) in ordered mode, then the
block is allocated as a data block to a file.  At this moment, user changes
the file's journal mode from ordered to journaled and truncates the file.
The block will be considered re-revoked by journal because it has revoked
flag still pending from the last transaction and an assertion triggers.

We fix the problem by keeping the revoked status more uptodate - we clear
revoked flag when switching revoke tables to reflect there is no revoked
buffers in current transaction any more.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-11-22 01:20:53 +01:00
Tejun Heo
a0acae0e88 freezer: unexport refrigerator() and update try_to_freeze() slightly
There is no reason to export two functions for entering the
refrigerator.  Calling refrigerator() instead of try_to_freeze()
doesn't save anything noticeable or removes any race condition.

* Rename refrigerator() to __refrigerator() and make it return bool
  indicating whether it scheduled out for freezing.

* Update try_to_freeze() to return bool and relay the return value of
  __refrigerator() if freezing().

* Convert all refrigerator() users to try_to_freeze().

* Update documentation accordingly.

* While at it, add might_sleep() to try_to_freeze().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Christoph Hellwig <hch@infradead.org>
2011-11-21 12:32:22 -08:00
Eryu Guan
8762202dd0 jbd/jbd2: validate sb->s_first in journal_get_superblock()
I hit a J_ASSERT(blocknr != 0) failure in cleanup_journal_tail() when
mounting a fsfuzzed ext3 image. It turns out that the corrupted ext3
image has s_first = 0 in journal superblock, and the 0 is passed to
journal->j_head in journal_reset(), then to blocknr in
cleanup_journal_tail(), in the end the J_ASSERT failed.

So validate s_first after reading journal superblock from disk in
journal_get_superblock() to ensure s_first is valid.

The following script could reproduce it:

fstype=ext3
blocksize=1024
img=$fstype.img
offset=0
found=0
magic="c0 3b 39 98"

dd if=/dev/zero of=$img bs=1M count=8
mkfs -t $fstype -b $blocksize -F $img
filesize=`stat -c %s $img`
while [ $offset -lt $filesize ]
do
        if od -j $offset -N 4 -t x1 $img | grep -i "$magic";then
                echo "Found journal: $offset"
                found=1
                break
        fi
        offset=`echo "$offset+$blocksize" | bc`
done

if [ $found -ne 1 ];then
        echo "Magic \"$magic\" not found"
        exit 1
fi

dd if=/dev/zero of=$img seek=$(($offset+23)) conv=notrunc bs=1 count=1

mkdir -p ./mnt
mount -o loop $img ./mnt

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-11-01 19:04:59 -04:00
Tao Ma
a212d1a71d jbd: Use WRITE_SYNC in journal checkpoint.
In journal checkpoint, we write the buffer and wait for its finish.
But in cfq, the async queue has a very low priority, and in our test,
if there are too many sync queues and every queue is filled up with
requests, and the process will hang waiting for the log space.

So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
be moved into sync queue and handled by cfq timely. We also use the new plug,
sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.

Reported-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-06-28 00:06:41 +02:00
Jan Kara
bb189247f3 jbd: Fix oops in journal_remove_journal_head()
journal_remove_journal_head() can oops when trying to access journal_head
returned by bh2jh(). This is caused for example by the following race:

	TASK1					TASK2
  journal_commit_transaction()
    ...
    processing t_forget list
      __journal_refile_buffer(jh);
      if (!jh->b_transaction) {
        jbd_unlock_bh_state(bh);
					journal_try_to_free_buffers()
					  journal_grab_journal_head(bh)
					  jbd_lock_bh_state(bh)
					  __journal_try_to_free_buffer()
					  journal_put_journal_head(jh)
        journal_remove_journal_head(bh);

journal_put_journal_head() in TASK2 sees that b_jcount == 0 and buffer is not
part of any transaction and thus frees journal_head before TASK1 gets to doing
so. Note that even buffer_head can be released by try_to_free_buffers() after
journal_put_journal_head() which adds even larger opportunity for oops (but I
didn't see this happen in reality).

Fix the problem by making transactions hold their own journal_head reference
(in b_jcount). That way we don't have to remove journal_head explicitely via
journal_remove_journal_head() and instead just remove journal_head when
b_jcount drops to zero. The result of this is that [__]journal_refile_buffer(),
[__]journal_unfile_buffer(), and __journal_remove_checkpoint() can free
journal_head which needs modification of a few callers. Also we have to be
careful because once journal_head is removed, buffer_head might be freed as
well. So we have to get our own buffer_head reference where it matters.

Signed-off-by: Jan Kara <jack@suse.cz>
2011-06-27 11:44:37 +02:00
Ding Dinghua
bd5c9e1854 jbd: fix a bug of leaking jh->b_jcount
journal_get_create_access should drop jh->b_jcount in error handling path

Signed-off-by: Ding Dinghua <dingdinghua@nrchpc.ac.cn>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-06-25 17:29:51 +02:00
Jan Kara
05713082ab jbd: remove dependency on __GFP_NOFAIL
The callers of start_this_handle() (or better ext3_journal_start()) are not
really prepared to handle allocation failures. Such failures can for example
result in silent data loss when it happens in ext3_..._writepage().  OTOH
__GFP_NOFAIL is going away so we just retry allocation in start_this_handle().

This loop is potentially dangerous because the oom killer cannot be invoked
for GFP_NOFS allocation, so there is a potential for infinitely looping.
But still this is better than silent data loss.

Signed-off-by: Jan Kara <jack@suse.cz>
2011-06-25 17:29:51 +02:00
Lukas Czerner
99cb1a318c jbd: Add fixed tracepoints
This commit adds fixed tracepoint for jbd. It has been based on fixed
tracepoints for jbd2, however there are missing those for collecting
statistics, since I think that it will require more intrusive patch so I
should have its own commit, if someone decide that it is needed. Also
there are new tracepoints in __journal_drop_transaction() and
journal_update_superblock().

The list of jbd tracepoints:

jbd_checkpoint
jbd_start_commit
jbd_commit_locking
jbd_commit_flushing
jbd_commit_logging
jbd_drop_transaction
jbd_end_commit
jbd_do_submit_data
jbd_cleanup_journal_tail
jbd_update_superblock_end

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-06-25 17:29:51 +02:00
Eryu Guan
c2b67735e5 jbd: Fix comment to match the code in journal_start()
journal_start returns an ERR_PTR() value rather than NULL on failure.

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-05-24 00:27:53 +02:00
Tao Ma
9199e66528 jbd/jbd2: remove obsolete summarise_journal_usage.
summarise_journal_usage seems to be obsolete for a long time,
so remove it.

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-05-17 13:47:42 +02:00
Jan Kara
2842bb20ee jbd: Fix forever sleeping process in do_get_write_access()
In do_get_write_access() we wait on BH_Unshadow bit for buffer to get
from shadow state. The waking code in journal_commit_transaction() has
a bug because it does not issue a memory barrier after the buffer is moved
from the shadow state and before wake_up_bit() is called. Thus a waitqueue
check can happen before the buffer is actually moved from the shadow state
and waiting process may never be woken. Fix the problem by issuing proper
barrier.

CC: stable@kernel.org
Reported-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-05-17 13:47:42 +02:00
Ted Ts'o
d9b01934d5 jbd: fix fsync() tid wraparound bug
If an application program does not make any changes to the indirect
blocks or extent tree, i_datasync_tid will not get updated.  If there
are enough commits (i.e., 2**31) such that tid_geq()'s calculations
wrap, and there isn't a currently active transaction at the time of
the fdatasync() call, this can end up triggering a BUG_ON in
fs/jbd/commit.c:

	J_ASSERT(journal->j_running_transaction != NULL);

It's pretty rare that this can happen, since it requires the use of
fdatasync() plus *very* frequent and excessive use of fsync().  But
with the right workload, it can.

We fix this by replacing the use of tid_geq() with an equality test,
since there's only one valid transaction id that is valid for us to
start: namely, the currently running transaction (if it exists).

CC: stable@kernel.org
Reported-by: Martin_Zielinski@McAfee.com
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-05-17 13:47:41 +02:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Linus Torvalds
6c51038900 Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
  Documentation/iostats.txt: bit-size reference etc.
  cfq-iosched: removing unnecessary think time checking
  cfq-iosched: Don't clear queue stats when preempt.
  blk-throttle: Reset group slice when limits are changed
  blk-cgroup: Only give unaccounted_time under debug
  cfq-iosched: Don't set active queue in preempt
  block: fix non-atomic access to genhd inflight structures
  block: attempt to merge with existing requests on plug flush
  block: NULL dereference on error path in __blkdev_get()
  cfq-iosched: Don't update group weights when on service tree
  fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
  block: Require subsystems to explicitly allocate bio_set integrity mempool
  jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
  jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
  fs: make fsync_buffers_list() plug
  mm: make generic_writepages() use plugging
  blk-cgroup: Add unaccounted time to timeslice_used.
  block: fixup plugging stubs for !CONFIG_BLOCK
  block: remove obsolete comments for blkdev_issue_zeroout.
  blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
  ...

Fix up conflicts in fs/{aio.c,super.c}
2011-03-24 10:16:26 -07:00
Jens Axboe
65ab80279d jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
'write_op' was still used, even though it was always WRITE_SYNC now.
Add plugging around the cases where it submits IO, and flush them
before we end up waiting for that IO.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-17 10:56:45 +01:00
Jens Axboe
721a9602e6 block: kill off REQ_UNPLUG
With the plugging now being explicitly controlled by the
submitter, callers need not pass down unplugging hints
to the block layer. If they want to unplug, it's because they
manually plugged on their own - in which case, they should just
unplug at will.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:52:27 +01:00
Justin P. Mattock
3c26bdb423 jbd: Remove one to many n's in a word.
The Patch below removes one to many "n's" in a word..

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: linux-ext4@vger.kernel.org
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-02-28 21:55:58 +01:00
Uwe Kleine-König
a34f0b3139 fix comment typos concerning "consistent"
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-12-10 16:04:28 +01:00
Linus Torvalds
7d2f280e75 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (24 commits)
  quota: Fix possible oops in __dquot_initialize()
  ext3: Update kernel-doc comments
  jbd/2: fixed typos
  ext2: fixed typo.
  ext3: Fix debug messages in ext3_group_extend()
  jbd: Convert atomic_inc() to get_bh()
  ext3: Remove misplaced BUFFER_TRACE() in ext3_truncate()
  jbd: Fix debug message in do_get_write_access()
  jbd: Check return value of __getblk()
  ext3: Use DIV_ROUND_UP() on group desc block counting
  ext3: Return proper error code on ext3_fill_super()
  ext3: Remove unnecessary casts on bh->b_data
  ext3: Cleanup ext3_setup_super()
  quota: Fix issuing of warnings from dquot_transfer
  quota: fix dquot_disable vs dquot_transfer race v2
  jbd: Convert bitops to buffer fns
  ext3/jbd: Avoid WARN() messages when failing to write the superblock
  jbd: Use offset_in_page() instead of manual calculation
  jbd: Remove unnecessary goto statement
  jbd: Use printk_ratelimited() in journal_alloc_journal_head()
  ...
2010-10-27 20:13:18 -07:00
Andrea Gelmini
bcf3d0bcff jbd/2: fixed typos
"wakup"

Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-10-28 01:30:05 +02:00
Namhyung Kim
e4d5e3a497 jbd: Convert atomic_inc() to get_bh()
Convert atomic_inc(&bh->b_count) to get_bh(bh) for consistency.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-10-28 01:30:05 +02:00
Namhyung Kim
b8ea49fa9b jbd: Fix debug message in do_get_write_access()
'buffer_head' should be 'journal_head'.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-10-28 01:30:04 +02:00
Namhyung Kim
2a0e33889b jbd: Check return value of __getblk()
Fail journal creation if __getblk() returns NULL.  unlikely() is
added because it is called in a loop and we've been OK without
the check until now.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-10-28 01:30:04 +02:00
Namhyung Kim
a910eefa51 jbd: Convert bitops to buffer fns
Convert set/clear_bit(BH_JWrite, ...) to set/clear_buffer_jwrite()
for consistency.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-10-28 01:30:02 +02:00
Darrick J. Wong
dff6825e9f ext3/jbd: Avoid WARN() messages when failing to write the superblock
This fixes a WARN backtrace in mark_buffer_dirty() that occurs during unmount
when the underlying block device is removed.  This bug has been seen on System
Z when removing all paths from a multipath-backed ext3 mount; on System P when
injecting enough PCI EEH errors to make the SCSI controller go offline; and
similar warnings have been seen (and patched) with ext2/ext4.

The super block update from a previous operation has marked the buffer as in
error, and the flag has to be cleared before doing the update. Similar changes
have been made to ext4 by commit 914258bf2c.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-10-28 01:30:02 +02:00
Namhyung Kim
8117f98c05 jbd: Use offset_in_page() instead of manual calculation
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-10-28 01:30:02 +02:00
Namhyung Kim
2b23976908 jbd: Remove unnecessary goto statement
Remove goto statement which jumps to very next line. Also remove
target label because it is no longer used anywhere.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-10-28 01:30:01 +02:00
Namhyung Kim
f81e3d4564 jbd: Use printk_ratelimited() in journal_alloc_journal_head()
Use printk_ratelimited() instead of doing it manually.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-10-28 01:30:01 +02:00
Namhyung Kim
c5639bef63 jbd: Move debug message into #ifdef area
Move call to jbd_debug() into #ifdef CONFIG_JBD_DEBUG block because
'dropped' is declared there. The code could be compiled without this
change anyway, simply because jbd_debug() expands to nothing if
!CONFIG_JBD_DEBUG but IMHO it doesn't look good in general.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-10-28 01:30:01 +02:00